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Any reply received by the GfTice later than three months after the mailing date of this communication, even if timely filed, may reduce any 
earned patent term adjustment. See 37 CFR 1.704(b). 

Status 

1 )I3 Responsive to communication(s) filed on 30 October 2001 . 
2a)n This action is FINAL. 2b)l3 This action is non-final. 

3) 0 Since this application is in condition for allowance except for fomnal nnatters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayle, 1935 CD. 1 1 . 453 O.G. 213, 
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DETAILED ACTION 

1. This action is responsive to the appKcation filed October 30, 2001. 

2. The priority date considered for this application is May 31, 2001, which is the 
filing date of the provisional application no. 60/ 294,913. 

3. Qaims 1-19 have been examined. 



Drawings 

4. The drawings are objected to because of the following minor informalities: 

a. Figures lA and IB should be designated by a legend such as - Prior Art 
- because only that which is old is illustrated. See MPEP § 608.02(g); 

b. Figures lA and IB: 

L an arrowhead is missing at the bottom end of the connecting line 
between blocks 102, 152 and 104, 154, respectively, and 

ii, it is not clear why there is a connecting line between the output of 
block 104, 154 and the input of block 102, 152 respectively; 

c. Figure 2: 

i. Block 206: the temis "empty" and "unknown" should be enclosed 
between quotation mari^s; 

ii. Block 208: a question mark should be added at the end of the 

clause; 

iii. Block 214: "YES" and "NO" legends are missing at the outputs 
of this decision block; 

iv. Figure 2 on page 3 of 5 should be labeled - Figure 2 A - and the 
Figure shown on the next page, i.e., sheet 4 of 5, should be labeled - Figure 2B - 
because Figure 2B is a continuation of Figure 2A; 
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V. "YES" and "NO" legends are missing at the outputs of decision 
blocks 218, 222, 226, 230, 234; 

vi. destination of output from each of the three boxes labeled "238" 
should be indicated. 



Specification 

5. The specification is objected to because of the following minor infomnalities: 

a. the section "Brief Summary of the Invention" is missing; 

Brief Summary of the Invention : See MPEP § 608.01(d). A brief 
summaiy or general sutement of the invention as set forth in 37 
CFR 1.73. The summaiy is separate and distinct from the abstract and is 
directed toward the invention rather than the disclosure as a whole. The 
summaiy may point out the advantages of the invention or how it sokes 
problems previously existent in the prior art (and preferably indicated in 
the Bacl^round of the Invention). In chemical cases it should point out 
in general terms the utility of the invention. If possible, the nature and 
gist of the invention or the inventive concept should be set forth. 
Objeas of the invention should be treated briefly and only to the extent 
that they contribute to an understanding of the invention. 

b. The term "complier" at page 1, line 22 is mistyped; 
C. The term "servelets" at page 1, line 29 is mistyped. 



Claim Objection 

6. Claims 2, 6 and 9 are objected to because of the following minor informalities: 
a. Qaims 2 (line 18 of claim) and 6 (Line 3 of claim): the terms empty and 
unknown should be enclosed between quotation maria; 
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b. Claims 2 (lines 3 1 and 34 of claim) and 9 (lines 5 and 7 of claim): the 
terai stack after "mappings to" should be enclosed between quotation nnarks. 

Claim Rejections - 35 USC § 112 

7. The following is a quotation of the first paragraph of 35 LIS.C 112: 

The specification shall contain a written description of the invention, and of the manner and process of making and 
using it, in such full, clear, concise, and exact terms as to enable any person skilled in the ait to which it pertains, or 
with which it is most nearly connected, to make and use the same and shall set forth the best mode contemplated by 
the inventor of carryii^ out his invention. 

8. Qaim 3 is rejected under 35 U.S.G 112, first paragraph, as failing to comply 
with the written description requirement. The claim(s) contains subject matter which 
was not described in the specification in such a way as to reasonably convey to one 
skilled in the relevant art that the inventor(s), at the time the application was filed, had 
possession of the claimed invention. Specifically, claim 3 recites the limitation using 
irformttion fifmiprBoadmg instmiians to rrmkan cptmwzir^aynpikr, which is found to be 
not sufficiently described in the specification so as to convey to one skilled in the art 
how to write code to perform such a step. First, it is noted that the specification in 
section 0028, lines 3-5 shows that the preceding translation information is being used 
rather than irfcmuwn fim^preoddin^ i^m4ctions as recited in claim 3. Preceding 
translation information is different from information from preceding instmctions. 
For art rejection purposes, the limitation is interpreted as using pwoadir^ translation 
vfonmtim to nimcan opWrszir^ oonpiler. Second, it is unclear what to nwic an optimzir^ 
ay7|!7z&r explicitly means. The specification does not provide any detailed description 
of this function. 

9. Qaims 4-19 which depend from claim 3 is also rejected under 35 U.S.C 112, 
first paragraph for the same reasons. 
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10. The following is a quotation of the second paragraph of 35 LXS.C 112: 

The specification shall conclude with one or na.ore claims particularly pointing out and distincdy claiming the subject 
matter ^^^h the applicant regards as his invention. 

11. Qaims 2-19 arc rcjected under 35 US.C 112, second paragraph, as being 
indefinite for failing to particularly point out and distincdy claim the subject matter 
which applicant rcgards as the invention. 

a. Lack of antecedent basis: 

Claim 2 rccites the limitation "said class file" in line 22 of the claim. 
There is insufficient antecedent basis for this limitation in the clainL 

Qaim 2 rccites the limitation "said mappings" in line 34 of the claim 
(after "setting''). There is insufficient antecedent basis for this limitation in the claim. 
The limitation "said mappings" shoiJd be changed to - said stack mappings - in onier 
to have proper antecedent basis. 

Qaims 2 (in line 59 of the claim) and 14 (in line 4 of the claim) recite the 
limitation "said stack values". There is insufficient antecedent basis for this limitation 
in the claim, 

Qaims 4-19 recite the limitation "said compilation procedure" in line L 
There is insufficient antecedent basis for this limitation in the clainL 

Qaim 9 recites the limitation "said selected actual bytecode instmction" 
in lines 2-3 of the claim. There is insufficient antecedent basis for this limitation in 
the clainL 

Qaim 9 recites the limitation "said mappings" (two occurrences) in lines 
6-7 of the claim. There is insufficient antecedent basis for this limitation in the claim. 
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The limitation "said mappings" should be changed to - said stack mappings - in order 
to have proper antecedent basis. 

b. Conftising and indefinite: 

Qaims 2 (line 25 of the claim) and 8 (line 2 of the claim) recites the 
limitation "stack maps." It is unclear whether the limitation "stack maps" is the same 
as the limitation "stack mappings." For art rejection purposes, the limitation "stack 
maps" is considered equivalent to "stack mappings." 

Qaim 3 recites the limitation to nrnican (pimizingcmpilery which is found 
to be indefinite because one skilled in the art does not know precisely and clearly what 
and how many optimizing teclinique(s) to mimic. For art rejection purposes, the 
limitation usirig irijamtakn frcmpreoB^ing wstmctioiis to nirrkm opdnizmg ampiier is 
interpreted to mean using preceding translation information (e.g., pre- verification 
step) to give the method the same optimizing result of a conpiler such as the one 
taught by US. Patent No. 5,999,731 to Yellin et al. (see instant disclosure, section 
0012). 

Double Patenting 

12. The nonstatutory double patenting rejection is based on a judicially created 
doctrine grounded in public policy (a policy reflected in the statute) so as to prevent 
the unjustified or improper timewise extension of the "right to exclude" granted by a 
patent and to prevent possible harassment by multiple assignees. See In re Goodmw, 
11 F.3d 1046, 29 USPQ2d 2010 (Fed. Or. 1993); InreLong, 759 F.2d 887, 225 USPQ 
645 (Fed. Gr. 1993); In re Van Qmum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In 
re Vo^ 422 F2.d 438, 164 USPQ 619 (CCPA 1970); and, Inre Tharir^ 418 F2.d 
528, 163 USPQ 644 (GCPA 1969). 
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A timely filed terminated disclaimer in compliance with 37 CFR 1.103(c) 
1,321 (c) maybe used to overcome an actual or provisional rejection based on a 
nonstatutory double patenting ground provided the conflicting application or patent is 
shown to be commonly owned with this appKcation. See 37 CFK 1.130(b)- 

Effective January 1, 1994, a registered attorney or agent of record may sign a 
terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply 
with 37 CFR 373(b). 

13. Qaims 1- 19 are provisionally rejected under the judicially created doctrine of 
obviousness-type double patenting as being unpatentable over claims 1-29 of 
copending Application No. 10/016,794. This is a provisional double patenting 
rejection since the conflicting claims have not yet been patented. 

The subject matter claimed in the instant application is fully disclosed in the 
referenced copending application and would be covered by any patent granted on that 
copending application since the referenced copending application and the instant 
application are claiming common subject matter or obvious variation thereof, as 
shown in the following table(s). 

Furthermore, there is no apparent reason why applicant would be prevented 
from presenting claims corresponding to those of the instant application in the other 
copending application. See InwSdmdle, 397 F2.d 350, 158 USPQ 210 (CCPA 1968). 
See also MPEP § 804. 



Copendii^ Claim 1 


Instant claim 1 


A computer apparatus suitable for use in 
the combined compilation and 
verification of platform neutral bytecode 
instructions resulting in optimized 


A computer apparatus suitable for use in 
the fast compilation of preverified 

platform neutral bytecode instructions 
resulting in high quality native machine 
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machine code, comprising: 


code, comprising: 


a central processing unit (CPU); 


a central processing unit (CPUj); 


a computer memory coupled to said 
CPU, said computer memory comprised 
of a computer readable medium; 


a computer memory coupled to said 
CPU, said computer memory comprised 
of a computer readable medium; 


a compilation- verification program 
embodied on said computer readable 
medium, said compilation-verification 

program comprising: 


a compilation program embodied on 
said computer readable medium, said 
compilation program comprising: 


a first code segment that receives a 
bytecode Irstmg; 


a first code segment that receives a class 
me listing; 


a second code segment that verifies said 
bytecode listing is free of malicious 
and improper code and compiles said 
bytecode listing into machine code; and 


a second code segment that compiles 
said class file listing into machine code; 
and 


a third code segment that interprets and 
executes said machine code. 


a third code segment that interprets and 
executes said machine code- 



Copending claim 7 


Instant claim 3 


A computer implemented method for 
facilitating combined compilation and 
verification of platform neutral bytecode 
instmctions resulting in optimized 
machine code, comprising the steps of: 


A computer implemented method for 
compilation of preverified platform 
neutral bytecode instmctions resulting in 
high quality native machine code, 
comprising the steps of: 


receiving a class file onto a computer 
readable medium containing compilation 
procedure instructions, said class file 
containing one or more methods 
containing platform neutral bytecode 
listings; 


receiving a class file onto a computer 
readable medium containing compilation 
procedure instmctions, said class file 
containing one or more methods 
containing platform neutral bytecode 
listings; 


executing said compilation procedure 
instmctions on said bytecode listings, said 
compilation procedure instmctions also 
simultaneously verifying said bytecode 
listings; and 


executing said compilation procedure 
instmctions on said bytecode listings, said 
compilation procedure instructions 
sequentially processing each byte 
code instmction of said bytecode listing; 
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using information from preceding 
instmctions to mimic an optimizing 
compiler, and 


producing verified optimized machine 
code on said computer readable medium. 


producing native machine code on said 
computer readable medium. 



As can be seen from the above tables, all the claims of the instant application 
are anticipated by those of the copending application. The invention of the instant 
application, i.e., method for compilation of pieveiified platfonn neutral bytecode 
instmctions, is not patentably distinct from that of the copending invention since the 
invention of the copending application is also related to a method for compilation of 
platfonn neutral byteceode instmctions. The only difference is that the method of the 
copending application compiles and verifies bytecodes. The same verification step of 
bytecodes of the copending method has been previously performed by the method of 
the instant application to ensure that the bytecodes are free of malicious or improper 
code. 

Claim Rejections - 35 USC § 102 

14. The followmg is a quotation of the appropriate paragraphs of 35 U.S.C 102 

that form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this countiy, or patented or described in a printed publication in this or a 
fore^ countiy, before the invention thereof by the applicant for a patent. 

15. Qaims 1 and 3-8 are rejected under 35 U.S.C. 102(a) as being unpatentable 
over the admitted prior art (APA) of Figures lA and IB and of pages 1-6 of 
applicant's backgrovind. 
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Claim 1 

APA discloses at least: 

a ampilatim program mhodied on said oonputer readable niBdmri said 
conpilatim pro^amoomprising 

a first code segwnt that mmjss a doss file listirjg (see at least Figure 1 A, 
step 102; Figure IB, step 152; Figure 2, step 202; and associated text); 

aseoord(Ddes^rmtlhatcm^ 
(see at least sections 0006-0012); ard 

a third oodesegmit that interprets andeKecutes saidmuhimoode (see at 
least Figure IB, step 154 and associated text; p. 2, line 23-24; p.3, line 14). 
The APA of applicant's background does not specifically 

disclose: 

a central pnxEssing wnt ( CPU); 

aoonptaerrnsnTjrycD^ledtosaidG^U,^ 
a(X)np4terimM?lerrEdkim 

However, this hardware support is deemed to be inherent to the APA 
teaching of a compilation procedure. Without a CPU which executes the instructions 
of a computer program (e.g., compiler and JAVA'^^ virtual machine) that is stored on 
a computer readable medium and loaded onto a random access memory of a 
computer system, the compilation procedure would be inoperative and would 
produce no useful, concrete and tangible results. 

Claim 3 

APA discloses a method for compilation of pre- verified platform neutral 
bytecode instmctions comprising at least: 

i^dr^adassfikMoaoorrputerrmlal^ 
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proaadurs instmtiorSy said doss file oontairang cm or ware nedTods cantairungpla^onn raOrd 
b^mxlelistir^ (see at least Figure lA, step 102; Figure IB, step 152; Figure 2, step 202; 
and associated text; sections 0004-0005); 

executingsaid ainpilationpnxEdu)v mstmoions on said byleoode listing, said 
oanfihaknproaddure instmtions sequentially pnxEssir^ eadf byte code instruction cf said byteaxk 
listing (see at least sections 0006-0008); 

usit^ iyfanmtion fixmpmBthg instructions to rrmican cptirrizing aonpiler (see at 
least section 0012); and 

produdr^ natiw rmdme axkcnsaid computer veadMe madiwn (see at least 
Figure IB, step 154 and associated text; p. 2, line 23-24; p.3, line 14). 

Gaim4 

APA further discloses liherdnsaidoonpilaticnpnxBdutes^ doss to oanpile (see 
at least F^ure lA and associated text; section 0007). 

Claims 

APA further discloses vherein said ajnpHatimpnxBdwe selects first nedxd cf said first 
dass to conpile (see at least Figure lA and associated text; section 0007). 

Gaim 6 

APA does not specifically disclose "ohereinsaidoorrpilaticnprooBdurea^^ mtp 
storage to store actual mappings and native code addresses and initialises stack mappings to "empty" 
and addresses to "unknown". However, these steps are deemed to be inherent to the 
teaching of APA (see at least Figure lA, step 104 and Figure IB, step 154; and 
associated text). Without these stored information, optimization would not be 
possible. 
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QaimZ 

APA does not specif ically disclose "uhereinsaidajr^^ 
selects auh bytmxk instnic^ Jik However, this step is 

deemed to be inherent to the teaching of APA (see at least Figure 1 A, step 102; Figure 
IB, step 152; and associated text). As can be seen in steps 102 and 152, each bytecode 
instruction is analyzed one at a time. Without this looping process that analyzes each 
instmction and collects information therefrom, optimization would not be possible. 

Claim 8 

APA further discloses lehemnsaid CDnpilatimpnxjdd^ detects stack mtps for said 
sleeted byteoode irstmtim (see at least Figure lA, step 104; Figure IB, step 154; and 
associated text)- 

Allowable Subject Matter 

16. Qaim 2 is objected to as containing minor informalities and terms lacking 
proper antecedent basis but would be allowable if rewritten to correct these 
deficiencies. 

Qaims 9-19 are objected to as being dependent upon a rejected base claim, but 
would be allowable if rewritten in independent form including all of the limitations of 
the base claim and any intervening claims. 

It is noted, however, that these claims when taken individually and without 
including all of the limitations of the base claim and any intervening claims are not 
allowable. 

It is suggested that the aspect of the invention that consists of creating 
optimized machine code from bytecode in a single sequential pass in which 
information from preceding instmction translations is used to perform the same 
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optimizing process of an optimizing compiler without the extensive memory and time 
requirements be recited in independent claims 1 and 3 to particularly point out and 
distinctly claim the invention. 



Conclusion 

17. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

18. Any inquiry concerning this communication or earlier communications from 
the examiner should be directed to Hoang-Vu A Nguyen-Ba whose telephone 
number is (703) 305-0103. The exanrmer can normally be reached on Tuesday Friday, 
6:00-16:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Tuan Dam can be reached on (703) 305-4552. After October 25, 2004, 
the examiner can be reached at (571) 272-3701 and the examiner's supervisor at (571) 
272-3695. The fax phone number for the organization where this application or 
proceeding is assigned is 703-872-9306. 

Information regarding the status of an application maybe obtained from the 
Patent Application Information Retrieval (PAIR) system Status information for 
published applications maybe obtained from either Private PAIR or Public PAIR 
Status information for unpublished applications is available through Private PAIR 
only. For more information about the PAIR system, see hnp:// pair-direct.uspto.gov. 
Should you have questions on access to the Private PAIR system, contact the 
Elearonic Business Center (EBC) at 866-217-9197 (toll-free). 

^ Alt Unit 2122 

•^"^^^x^^l^J^^T^^^ September 2, 2004 

ANTONY NQUYEN-BA 
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Jalapeno is a virtual machine forJava^ servers 
wn'tten in the Java language. To be sbh to 
address the requirements of servers 
(performance and scalability in particular), 
Jalapeno was designed "from scratch" to be as 
self-sufficient as possible. Jalapeno's unique 
object model and memory layout allows a 
hardware null-pointer check as well as fast 
access to array elements, fields, and methods. 
Run -time sen/ices conventionally provided in 
native code are implemented primarily in Java. 
Java threads are multiplexed by virtual 
processors (implemented as operating system 
threads). A family of concurrent object allocators 
and parallel type-accurate garbage collectors is 
supported. Jalapeno's interoperable compilers 
enable quasi -preemptive thread switching and 
precise location of object references. Jalapeno*s 
dynamic optimizing compiler is designed to 
obtain high quality code for methods that are 
observed to be frequently executed or 
computationally intensive. 



Jalapeno is a Java'"" virtiKii mac nine ^Jvm; jor 
sci'\'c/>. The nioiioi^v consiriiin i\ oi) a sender id -: 
noi as liuiii as ihcv arc on oiiicr plaHonns. 0)i iJic 
other hand, a Jvni for scr\'crs nuisi satistv rcijuirc- 
nicnls sucli as the lollowing thai arc not as stringent 
for chcnl, personal, or eml)ecide(i Jvnis: 



1 . Exploitation ofhigh-perfonnan ce processors — Cn r- 
rent just-in-time (JlT) compilers do noi perfonvj 
the extensive optimizations for exploiting mod- 
ern hardware features (memon,' hierarchy, in- 
struction-level parallelism, multiprocessor paral- 
lelism, etc.) that are necessary to obtain 
performance comparable with statically compiled 
languages, 

2. SMP scalability — Shared-memor^' multiprocessor 
(SMP) configurations are ver}^ popular for server 
machines. Some Jvms map Java threads directly 
onto heavyweight operating system threads. This 
leads to poor scalability of multithreaded Java 
programs on an SMP as the numbers of Java 
threads increases. 

3. Thread limits — Many server applications need to 
create new threads for each incoming request. 
However, due to operating system constraints, 
some Jvms are unable to create a large number 
of threads and hence can only deal with a limited 
number of simultaneous requests. These con- 
straints are severely limiting tor applications that 

'''C«ipyrij»li< 2l)nfl bv Intt : nallona! Busnv^vs M;icl.itiCS Co« j'jO! :-.- 
I'.'^ii. CoiniiJL lit :;rtni-'-! {<::•• • . . , • . . v r:ormiUcd wr.h 
«iulp;iyinci!! oi ri\\';ittynr!>vi(Jc<J tl^it ( \ ) cac\\ rcproJiH !i(^n is vn; 
wilhoiil Hllci ;!{iu[i ;iih! (2; Jnumni 5Vtcj..iuv 1 U.VI cOj > 
l iglil lU'licc itic iiKlii.ii.d on llu! lust p^i'jc. 1 Ik- \\\\c anO ;ihsti:u:i. 
but no other jioilioii.s, ol UiLs p.ipci muy he copied or disiribuicd 
royalty free without further permission hy tompuicr-bascd and 
other information-scivico systems. IVrmission to repMish any 
other portion of this paper must he obtained from tbc ISditor. 
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need to support thousands of users simulta- 
neously. 

4 . Coiitui 'lous ovoihihl^ity — Senv r ap;* i >at ioi^s ■ • • ^; 

continuously for long durations (e.g., several 
months). This docs not appear to be a priorit}^ 
for current Jvms. 

5. Rapid response — Most server applications have 
stringent response-time requirements (e.g., at 
least 90 percent of requests must be served in less 
than a second). However, many current Jvms per- 
form nonincremental garbage collection leading 
to severe response-time failures. 

6- Library usage — Server applications written in J ava 
code are typically based on existing libraries 
(beans, frameworks, components, etc.) rather 
than being written "from scratch," However, since 
these libraries are written to handle generic cases, 
they often perform poorly on current Jvms. 

7. Graceful degradation — ^As the requests made on 
a server oversaturate its capacity to fulfill them, 
it is acceptable for the performance of the ser\'er 
to degrade. It is noi acceptable for the server to 
crash. 

In Jalapcno- Requirement I addressed by a dy- 
namic optimizing compiler; lighter-weight compil- 
ers are provided for code that has not been shown 
to be a performance bottleneck. Requirements 2 and 
3 are addressed by the implementation of lightweight 
threads in Jalapeno. Implementing Jalapeno in the 
Java language addressed Requirement 4: Java type 
safety aids in producing correct code, and the Java 
automatic storage management prevents "dangling 
pointers" and reduces "storage leaks." We expect 
Requirement 5 to be satisfied by the concurrent and 
incremental memory management algorithms cur- 
rently being investigated. Requirement 6 will be sat- 
isfied by specialization transformations in the Jala- 
peiio optimizing compiler that tailor the dynamically 
compiled code for a library (for example) to the call- 
ing context of the server application. Although we 
know of no programmatic way to guarantee satis- 
faction of Requirement 7, we trs' not to lose sight of 
it. 

The paper is organized as follows. The next section 

consider> ii:ii:!i*iy ■ • : \)n 'S- wi/s. ']1k: ;V,!lnv\-ing m:c- 
lioii ()iosc:]t> the J.»:.i;,cnu Jviii. mchuung it> obit-ct 
mock:! Mjui luciiiorN i.ivoiii a.^c iis run-iinic. ;iH\';a) 
and s\ nciin)niz;jiitin, nicmiiry nuiii.iizcnicin, ;ind 
compihiiion snbsvslcms. Following scciioiis exam- 
ine Jalapeno's optimizing compiler, describe Jala- 
peno's current functional status and give some prc- 



iiminar)' performance results, and discuss related 
work. The final section presents our conclusions. Two 

;irpc]-dL"LS arc in*"': it:d \o e; p'.:";i i^-.v J .:\:;v ; :'o*s 
scia iccs e'w^A: rcSi. K ■ ; . ns 

while preserving the integrity of the language tor 
Jalapeiio's users, and to detail the process of boot- 
strapping Jalapeiio. 

Design and implementation issues 

The goal of the Jalapeno project is to produce a 
world-class server Jvm "from scratch." Our approach 
is to create a flexible test bed where novel virtual 
machine ideas can be explored, measured, and eval- 
uated. Our development methodology avoids pre- 
mature optimization: simple mechanisms are imtially 
implemented and are refined only when they are ob- 
served to be performance bottlenecks. 

Portability is wo/ a design goal: where an obvious per- 
formance advantage can be achieved by e.xploi; 
the peculiarities of Jalapeno's target architecture — 
PowerPC* architecture^ SMPs (symmetrical multi- 
processors) running AIX* (Advanced Interactive 
Executive)'^ — we feel obliged to take it. Thus, 
Jalapefio's object layout and lockJiig iriecharjism^. .^re 
quite architecture-specific. On the other hand, we 
are aware that w'e may want to port Jalapeno to some 
other platform in the future. Thus, where pei for- 
mance is not an issue, we endeavor to make Jala- 
peno as portable as possible. For performance as well 
as portability, we strive to minimize Jalapeno's de- 
pendence on its host operating system. 

The original impetus for building Jalapeno in the 
Java language was to see if it could be done.^ The 
development payofifs of using a modern, object-ori- 
ented, type-safe programming language with auto- 
matic memory management have been considerable. 
(For instance, we have encountered no dangling 
pointer bugs, except for those introduced by early 
versions of our copying garbage collectors which, of 
necessity, circumvented the Java memory model.) 
We expect to achieve performance benefits from Java 
development as well: first, no code need be executed 
to bridge an interlinguistic gap between user code 
and run-time services; and second, because of this 
seamless operation, the opliivji:.:.;;; eonipiL.i .. o-y 
sin>ultaneonslv optimize user and nin-timccoue. :i.-;n 
even compile Ireqiientiy executed run-lime ser. ;:-.:s 
in line wuiiiii user code. 

riie Jalapcfio implcmeniation must sometimes evade 
the restrictions of the Java language. At the same 
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time, Jaiapeno must enforce these restrictions on its 
users. Jalapcno's mechanism for perform ini: such 
c<ni!:"olK (l exasior-- !> prcsenwd :n Afpv :::':;X A. 

Vers' little of Jaiapeno is//o^ wiiiivr. : n the Java lan- 
guage. The Jaiapeno virtual machine is designed to 
run as a uscr-Icvci MX process. As such, it must use 
the host operating system to access the underlying 
file system, network, and processor resources. To ac- 
cess these resources, we were faced with a choice: 
the AlX kernel could be called directly using low-level 
system calling conventions, or it could be accessed 
through the standard C library. We chose the latter 
path to isolate ourselves from release-specific oper- 
ating system kernel dependencies. This required that 
a small portion of Jaiapeno be written in C rather 
than Java code. 

To date, the amount of C code required has been 
small (about lOOOhnes). About half of this code con- 
sists of simple "^glue" functions that relay calls be- 
tween Java methods and the C library. The only pur- 
pose of this code is to convert parameters and return 
values between Java format and C format. The other 
half of the C code consists of a "boot" loader and 
t\\ o signal handlers. The boot ioader allocates mem- 
ory for the virtual machine image, reads the image 
from disk into memon\ and branches to the image 
startup code (see Appendix B). The first signal han- 
dler captures hardware traps (generated by null 
pointer dereferences) and trap instructions (gener- 
ated for array bounds and divide-by-zero checks), 
and relays these into the virtual machine, along with 
a snapshot of the register state. The other signal han- 
dler passes timer interrupts (generated every 100 mil- 
liseconds) to the running Jaiapeno system. 

Jvm organization 

Following subsections describe Jalapeno's object 
model, run-time subsystem, thread and synchroni- 
zation subsystem, memory management subsystem, 
and compiler subsystem. 

Java objects are laid out to allow fast access to field 
and array elements, to achieve hardware null pointer 

checks, to provide a four-instruction virtual-method 
dispatch, and lu en;.iJMe Jess frcutjcni open:* i^nssnc!; 
as synchronization, type-aecuraie garbage coueciiun. 
ii Ki hasliini:. KaSi iicev.>s lo st.iiic objects antl nieih- 
ods IS also supporieii. 

InconventionalJvms. run-time services — exception 
handling, dynamic type checking, dynamic class load- 



ing, interface invocation, input and output, reflec- 
tion, etc. — are in^plemented by cofive method:, wn't- 

u :-. in C. ( . :;:-senihic:. \vi Jahtpcr; 

sj:\'iCC\ ;:;e ; .ipl 'j-icnic!.! prin::::i;\ v:jJe. 

Rather than implement Java threads as operating 
system threads, Jalapeiio multiplexes Java threads 
on virtual processors, implemented as AlX pthreads/ 
Jalapeno's locking mechanisms are implemented 
without operating system support. 

Jaiapeno supports a family of memory managers, 
each consisting of an object allocator and a garbage 
collector. All allocators are concurrent. Currently, 
all collectors are stop-the-world, parallel, and type- 
accurate collectors. Generational and nongenera- 
tional, copying and noncopying managers are sup- 
ported- Incremental collectors are being investigated. 

Jaiapeno does not interpret bytccodes. Instead these 
are compiled to machine code before executioi i. Jala- 
perio supports three interoperable compilers that ad- 
dress different trade-offs between development time, 
compile time, and run time. These compilers are in- 
tegral to Jalapcno's design: they enable thread sched- 
uling, synchronization, lypc-a ecu :.ae garbage collec- 
tion, exception handling, and dynamic class loading. 

Object model and memory layout. Values in the Java 
language are either primitive (e.g., int, double, etc.) 
or they are references (that is, pointers) to objects. 
Objects are either arrays having components or sca- 
lars having fields. Jalapeno's object model is gov- 
erned by four criteria: 

• Field and array accesses should be fast. 

• Virtual method dispatch should be fast. 

• Null pointer checks should be performed by the 
hardware. 

• Other (less frequent) Java operations should not 
be prohibitively slow. 

Assuming the reference to an object is in a register, 
the object's fields can be accessed at a fixed displace- 
ment in a single instruction. To facilitate array ac- 
cess, the reference to an array points to the first (ze- 
roth) component of an array and the remaining 

conjpoiienis . : "iJl ■■---J- . 

numneror c.»in!»v, i.eru.> in arrny. its/r/j.u;/;. is Kept 
just before i^s lirs' i.omponen;. 

The Java language requires that an attempt to ac- 
cess an object through a null object reference gen- 
erate a NullPointcrlixception. In Jaiapeno, refer- 
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Figure 1 Layout of an array object and a scalar object in Jalapeno 
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ences are machine addresses, and null is represented 
by Address 0. The AIX operating system permits loads 
from low memory, but accesses to very high mem- 
ory, at small negative offsets from a null pointer, nor- 
mally cause hardware interrupts.^ Thus, attempts to 
index off a null J ai ape no array reference are trapped 
by the hardware, because array accesses require load- 
ing the array length, which is -4 bytes off the array 
reference. A hardware null-pointer check for field 
accesses is effected by locating fields at negative off- 
sets from the object reference. 

In summary, in Jalapeiio, arrays grow up from the 
object reference (with the array length at a fixed neg- 
ative offset), while scalar objects grow ^/ow/i from the 
object reference with all fields at a negative otl'sct 
(see Figure 1). A held access is accomplished with 
:-;:ni:icinsTrL:iioii ha .- '^-^-Li J - ■ 

iiiil. iVl{)M ai 1 iiv in. L v^seS icqdiie liiice nihl riici!< »ns. 
A Niiigle \\<xy> \\\<s\\\.:\\on veiiiics thai the iiiGcx is 
wii iiin ilu: hounds i.: liie array. Except ior byie iaud 
boolean) arrays, the coniponem index must then be 
shifted to get a byte index. 1 he access itself is ac- 
complished using base-index addressing. 



Object headers, A two-word object header is asso- 
ciated with each object. This header supports virtual 
method dispatch, dynamic type checking, raemor}' 
management, synchronization, and hashing. It is lo- 
cated 12 bytes below the value of a reference to the 
object. (This leaves room for the length field in case 
the object is an array, see Figure 1,) 

One word of the header is a status word. The status 
word is divided into three bit fields. The first bit field 
is used for locking (described later). The second bit 
field holds the default hash value of hashed objects. 
The third bit field is used by the memory' manage- 
ment subsystem. (The size of these bit fields is de- 
termined by build- time constants.) 

oiher word oi ai) (>i:)jecl i.;:.!der is a rcferor- - 
ihe Type Infomvvion Block ' i ll?) for the objcci s 
ei.;>s. A i hi is an i:: ray of Java i ;ect reierences. lis 
rn>: component descritKS the ob.jet's class (includin*; 
its superclass, the mterfaces il implcnienis, offsels 
of any object reference fields, etc.). The remainimz 
components are compiled method bodies (exccut- 
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Figure 2 The Jalepeno Table of Contents and other objects 
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able code) for the virtual methods of the class. Thus, 
the TIB serves as Jaiapeno's virtual method table. 

Virtual methods. Method bodies arc arrays of ma- 
'iciioiis (his). A virtual niv^iiiod dispatch 
e!Majis lonainii the rif?. pointer at :\ f].\cd oti'sct off 
inc objLv.i rclcrcru^. . loaciii^i! tlic LiJiitcss the 
moihod liody at a pvcw elisor oil the rin pointer, 
moving this adtlrcss to the PowerPC 'Mink-registcn*' 
and executing a branch-and-link instruction — four 
instructions. 



Static fields and me/hods (and others). All static fields 
and references to a 11 static method bodies are stored 
in a single array called the Jalapeno Table of Con- 
tents (JTOC). A reference to this array is maintained 

i;. .J ;iCiJii .j.U:v] 'U: rc;L!iN';>.'r I J i i jy_ . ■■ . ■ 

AJl Ja!;ipcn«Vs i^-U^ba) d;ir:i sT'':H:turcs arc ac; o< 
sihlc tiiinifgj-i \ hc J i"Or\ Literal I'v ^liinicnc convi.i;'; ■ :-. 
and references lo string consiaiits are also stored in 
the JTOC. To enable fast common-case dynamic t\'pc 
checking, the jroc also contains references to the 
I IH for cjich class in the system. The JTOC is depicted 
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Figure 3 A thread's method invocation stack 
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in Figure 2. Although declared to be an array of ints, 
the JTOC contains values of all types. A JTOC descrip- 
tor array, coindexed with the JTOC, identifies the en- 
tries containing references. 

Method invocation stacks. Jalapeno's stack layout is 
depicted in Figure 3. There is a stack frame for each 
method invocation. (The optimizing compiler may 
omit stack frames for leaf and inline methods.) A 
Slack frame contains space to save nonvolatile reg- 
isters, a local data area where usage is compiler-de- 
pendent, and an area for parameters that arc to be 
passed to called methods and that will not fit in Jala- 
peno's volatile registers. The last three words in a 
Slack frame arc: a conipiled-nietliod identifier (identi- 
fying information about the method for the stack 
iVanii.:;. - r-ri iii'iinictic}: pcin:.:r (xUn rctii: / 
of iiny cain-a mcliiod ), ami ;i j)fVviou.S'fru}nL -^/itiiei . 

Mctiiod ijnocaiion slacks and liic J roc aa ihc only 
two Jalapcno structures that violate the Java lan- 
guage requirement that arrays not contain both pnm- 
itives and references. Since neither is directly acces- 



sible to users, this is not a lapse of security. However, 
in order to facilitate type-accurate garbage collec- 
tion, Jnlapeno's compilers ]^\\\r' n;;Mn;;: 'cfercn;:c 
]jiaps(L:C>.:ri!-.:Gi; icijth-; ] ^luk-:) {]](: 

ject references in a stack. 

Run-time subsystem, lliro ugh judicious exploitation 
of the MAGIC class (see Appendbc A), Jalapeno's run- 
time subsystem provides, (mostly) in Java code, ser- 
vices — exception handling, dynamic tj'pe checking, 
dynamic class loading, interface invocation, I/O, re- 
flection, etc. — that are conventionally implemented 
with native code. 

Exceptions. A hardware interrupt is generated if a 
null pointer is dereferenced, an array index is out of 
bounds, an integer is divided b\' zero, or a thread's 
method-invocation stack overflows. These interrupts 
are caught by a small C interrupt handler that causes 
a Java method to be run. This method builds the ap- 
propriate exception and passes it to the deliverEx- 
ccp'ion method. The celiverExcepiion method is 
called witli sofn^'arc -genera led exceptions as well 
It has two responsibilities. First, it must save in the 
exception object information that would allow a stack 
trace to be printed if one is iieeded. j; does tliis by 
^'walking" up the stack and recordmg the compiled 
method identifiers and next -instruction pointers for 
each stack frame. Second, it must transfer control 
to the appropriate "catch" block. This also involves 
walking the stack. For each stack frame, it locates 
the compiled-method object for the method body 
that produced the stack frame. It calls a method of 
this object to determine if the exception happened 
within an appropriate "try" block. If so, control is 
transferred to the corresponding catch block. If not, 
any locks held by the stack frame are released and 
it is deallocated. The next stack frame is then con- 
sidered. If no catch block is found, the thread is killed. 

Dynamic class loadi/ig. One of the innovative features 
of the Java language is its provision for loading 
classes during the execution of an application. When 
a Jalapeho compiler encounters a bytecode (putstatic 
or invokevirtual, for example) that refers to a class 
that has not been loaded, it does not load the class 
immediately. Rather, the compiler emits code that 
:.:., av;;.;/ first ensures thai the referenced cLiss 
^s ioadco (and resohed and instaniiaicd) and t!\:n 
perlonns tlit; ()per;.i*;on. Nolo lha: \v;;;;ri i ll is C' .ic 
l^ ijencraied the compiler cannot know iljc aciuai 
otlsei values (because they are not assiizned until the 
class is loaded) of fields and methods of the class 
(e.g., the JTOC index assigned to a static field). The 
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baseline compiler (Jalapeno's compilers arc de- 
scribed later) handles this uncertainty by emitting 
co'Ac that c:ills a run-tinic routine thai performs jt^v 
] . ccssan class loL>ding ^nd then Ov'crwrivcs \h.: ; ii 
siie with the code the baseline compiler would liavc 
produced had the class been resolved at initial code 
generation time. This is particularly tricky on an SMP 
with processors that adhere to a relaxed memory con- 
sistency model/ 

The optimizing compiler uses an alternative ap- 
proach based on an added level of indirection; when 
executed, the code it emits loads a value from an off- 
set array, if the offset value is valid (nonzero), it is 
used to complete the operation. If the offset is in- 
valid, the required class is loaded and, in the pro- 
cess, the otisct array is updated to contain valid val- 
ues for all of the class's methods and fields. For each 
dynamically linked site, the baseline compiler's ap- 
proach incurs substantia! overhead the first time the 
site is executed, but everv' subsequent execution in- 
curs no overhead, while the optimizini? conjpjli:r : 
approach incurs a small overhead on ever\ execu- 
tion of the site. However, the optimizing compiler 
is not normally called on a method until the method 
has executed several times; any dyi]aniically linked 
site this compiler sees can be assumed to be ver\' 
rarely executed. It is not yet clear which approach 
would be most appropriate for the quick compiler. 

Input and output. I/O requires operating system sup- 
port- To read a block from a file, an AlX stack frame 
is constructed and an operating system routine is 
called (through the C library) with an address to 
which to copy its result. This address is a Java array. 
Care is taken to prevent a copying garbage collector 
from moving this object until the call is complete (see 
Appendix A for details). So far, we have not observed 
a performance degradation from delaying garbage 
collection until the read completes. Other I/O ser> 
vices are handled similarly. 

Reflection. Java's reflection mechanism allows run- 
time access to fields (given their names and tvpes) 
and run-lime invocation of methods (given their sig- 
natures). It iscasy for Jalapeiio to support reflective 
field access: the name is turned into an offset and 

; . ihe appi'onriale raw n.t 

(n~v ;uiun;ss. Kcuecuve nieihod invocation is a li Me 
iiai uer. i iu; addrchs ot iti:: iDctliod !^(ul\' ohi;:i:::'(i 
liy hiuinig the signal u re ni a tabic. An aniheial siaek 
frame is const rucied. (Since this slack frame does 
not contain any object references, it is not necessary 
to build a reference map for it.) The method's pa- 



rameters are carefully unwrapped and loaded into 
registers, llie method is then called. When it returns, 
t'n.e artificial stack frp.me must be dispO'Sed of. ;;:id 
ih: resuivv rapped ::;u.i returned to liiC rei.e;::iv- c;:ll. 

Thread and synchronization subsystem. Rather than 
mapping Java threads to operating system threads 
directly, Jalapeno multiplexes Java threads on vir- 
tual processors that are implemented as AIX pthreads. 
This decision was motivated by three concerns. We 
needed to be able to effect a rapid transition between 
mutation (by normal threads) and garbage collec- 
tion. We wanted to implement locking without us- 
ing AIX services. We want to support rapid thread 
switching. 

Currently, Jalapeno establishes one virtual processor 
for each physical processor. Additional virtual pro- 
cessors may eventually be used to mask 1/0 latency. 
The only AIX service required by the subsystem is 

a periodic timer intcrru pt provided by the incinlea'al 
.system call. Jalapcno's locking mechanisms niakc ! o 
system calls. 

Quasi'preemption. Jalapefio's threads are neither 
"run-unii]-blocked" nor fully prcerriptivc. Reliance 
on voluntary yields would not have allowed Jalapeno 
to make tlic progress guarantees required lor a sen'er 
environment. We felt that arbitrary preemption 
would have radically complicated the transition to 
garbage collection and the identification of object 
references on thread stacks. In Jalapefio, a thread 
can be preempted, but only at predefinedyzeWpowiis, 

The compilers provide location information for ob- 
ject references on a thread's stack at yield points. 
Every method on the stack is at ^safe point (described 
later). Tliis allows compilers to optimize code (by 
maintaining an internal pointer, for example) be- 
tween safe points that would frustrate type -accurate 
garbage collection if arbitrary preemption were al- 
lowed. 

Locks. Concurrent execution on an SMP requires syn- 
chronization. Thread scheduling and load balancing 
(in particular) require atomic access to global data 
structures. User threads also need to synchronize ac- 
cess lOlhci, gh ^.:Ui/VKi Mi' poM b()Thsyv;v.. ... ... 

user synchronization, Jalapeno !-as iliree vva^'.^ 
locks.' 

A processor lock is a low-lcvcl primitive used for 
tl^read sclieduling (and load balancing) and to im- 
plement Jalapcilo's other locking mechanisms. Pro- 
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cesser locks arc Java objects with a single field that 
identities the virtual processor that owns the lock. 
If till- 1";. '(i null- the lock is not o'A ncd. The idcinily 

< ; -: ij. pi'vicossor 1:^ ii^iinriiincc- y: ciof.;:'/,.: ;xl 
processor (VR) register. To acquire a [ l occssor lock 
for the thread it is running, a virtual processor: 

• Loads the lock owner field making a CPU reser- 
vation (PowerPC Iwarx) 

• Checks that this field is null 

• Conditionally stores the PR register into it (Pow- 
erPC stwcx) 

If the stwcx succeeds, the virtual processor owns the 
lock. If the owner field is not null (another virtual 

processor owns the lock), or if the stwcx instruction 
fails, liie viriual processor will tr}' again (i,e,, spin). 
A processor lock is unlocked by storing nu?l into the 
owner field. Processor locks cannot be acquired re- 
cursively. Because processor locks **busy wait," they 
must only be held for very short inter\'als. A thread 
may not be swi-cbcd \v)iile owns a processor lock 
for two reasons: because it coald not release the lock 
until it resumes execution, and because our imple- 
mentation would improperly transfer ownership of 
the lock to ihe otlierihreads that e>:.:c!iTe on ibc y'):- 
tual processor. 

Jalapeiio's odier locking mechanisms are basea on 
thin locks:^'^ bits in an object header are used for lock- 
ing in the absence of contention; these bits identify a 
heavyweight lock when there is contention. Jalapeiio's 
approach differs in tvvo ways from the previous work. 
In the previous work, the heavyweight locking mech- 
anism was an operating system service; here it is a 
Java object. Here, if Thread A has a thin lock on an 
object, Thread B can proniote the lock to a thick lock. 
In the previous work, only the thread that owned a 
thin lock could promote it. 

A bit field in the status word of an object header (see 
earlier discussion) is devoted to locking. One bit tells 
whether or not a thick lock is associated with the ob- 
ject. If so, the rest of this bit field is the index of this 
lock in a global array. This array is partitioned into 
virtual-processor regions to allow unsynchronized al- 
location of thick locks. If the thick bit is not set, the 
rest oi the bit tield is fy-^\y\'\^:^. ;v.u: wk: 
Jnrk nwnrr siihheld ideiv.mes The ihreao { jt any) noi^j- 

iiiin 'ocK t»n ihc 'Ojcc^ (The S:Zc;s cS the 
tield^ean ix! adjusted lo .support up to li.tll a niillu)]i 
threads.) The recursion count subheld encodes the 
number ot times the owner holds the lock: unlike a 
processor lock, a thin lock can be recursively ac- 



quired. It the object is not locked, the entire locking 
bit field is zero. 

To Hccn:i;C :i ih:;^ loci.:. thrcac^ sets t!,c l:.i;v; -ci; 
ov,*ner bit tield to its identifier. This is only ailov. ed 
if the locking bit field is zero. The identifier of the 
thread currently running on a virtual processor is 
kept in a dedicated thread identifier (ti) register. 
Again, Iwarx and stwcx instructions are used to en- 
sure that the thin lock is acquired atomically, 

A thick lock is a Java object with six fields. The mu- 
tex field is a processor lock that synchronizes access 
to the thick lock. The associatedObject is a reference 
to the object that the thick lock currently governs. 
The ownedd field contains the identifier of the thread 
that owns the thick lock, if any. The recurslonCount 
field records the number of times the owner has 
locked the lock. The enteringQueue field is a queue 
of threads that are contending for the lock. And, the 
wailingOueue field is a queue of threads awaiting no- 
tification on the associaledObject. 

Conversion of a thin lock to a thick one entails: 

1. Creaiiiiii a thick lock 

2. Acquiring its mutex 

3. Loading the objects status word setting a reser- 
vation (Iwarx) 

4. Conditionally storing (stwcx) the appropriate val- 
ue — thick-lock bit set and the index for this thick 
lock — into the locking bit field of the object 
header 

5. Repeating Steps 3 and 4 until the conditional store 
succeeds 

6. Filling in the fields of the thick lock to reflect the 
object's status 

7. Releasing the thick lock's mutex 

1 here are two details to be considered about lock- 
ing on the PowerPC. First, the reservation (of a Iwarx) 
could be lost for a variety of reasons other than con- 
tention, including a store to the same cache line as 
the word with the reservation or an operating sys- 
tem context switch of the virtual processor. Second, 
before a lock (of any kind) is released, a sync instruc- 
tion must be executed to ensure that the caches in 
. ^. , ....-J,.. . : iile u iiilc the lock \v,is 
hc-^a, S:i:'i':5rly. altera lock js acquired, an isyrc iii- 
SiriJcijui- ;)Mist i c exccutcii so iiiai ik:^ subsct^rviU 
i!isiriicljc>ii exeetiics iti a suile context. 

Thread schcduluii^, Jalapeno miplemenis a lean 
thread scheduling algorithm that is designed to have 
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a short path length, to minimize synchronization, and 
to provide some support for load balancing. Tliread 
swi idling (and lo--ks)ar.* used lo iiiip'eniciU the yield 
and sleep meiho;-s cJ j.-: ,a.Ia: iq.C .. /ocl aiui liu 
notify, and notifyAII mcthodsofjava.lang. Thread as w ell 
as quasi-preemption. A thread switch consists of the 
following operations: 

• Saving the state of the old thread 

• Removing a new thread from a queue of threads 
waiting to execute 

• Placing the old thread on some thread queue (lock- 
ing the queue if necessary) 

• Releasing the process lock (if any) guarding ac- 
cess to this queue 

• Restoring the new thread's state and resuming its 
execution 

In the processor object associated with a virtual pro- 
cessor there are three queues of executable threads. 
An IdleQueue holds an idle thread that will execute 
whenever there is nothing else to do. A re>;dyQL'T- i;e 
holds other ready-to-executc threads. Only the vir- 
tual processor associated with them can access these 
two queues, so they need not be locked to be up- 
dated. This virtual processor is liie onh one that re- 
moves threads from atransferQueue. However, other 
virtual processors can put threads on This queue, so 
access to it is synchronized by a processor lock. The 
transferQueue is used for load balancing. 

Monitors. The Java language supports the monitor 
abstraction^ to allow user-level synchronization. 
Conceptually, there is a monitor associated with ev- 
ery Java object. However, few monitors are ever used. 
Typically, a thread acquires the monitor on an ob- 
ject by executing one of the object's synchronized 
methods. Only a handful of monitors are held at any 
one time. A thread can (recursively) acquire the same 
monitor multiple times, but no thread can acquire 
a monitor held by another, J alapeno uses its locking 
mechanisms to implement this functionahty. 

When a thread attempts to acquire the monitor for 

an object, there are six cases, depending on who owns 
the monitor for the object, and whether the object 
has a thick lock associated with it: 

1 . .iiii V. . u — no uiu r, ittck. I lie th(re:}ii ae- 
quiics • ihin iot. k o;. -.he lii^jcci a.s desciihci: }Mc- 
vii)usly. (This is by lar the most prevalent c;isc.j 

2. 01))ect owned by this thread — no thick lock. The 
thread increments the recursion-count bit field of 
the status word using Iwarx and stwcx instrtieiions. 



This synchronization is necessary since another 
virtual processor might simultaneously conc ert 
tlie thin lock to a tliick one. If iIi.'n h:i licic- o- 
hows, ii.e liiin iv . -ricci \o \hkk j. 

3. Object owned by aiiotiicr thread — no thick lock. 
This is the interesting case. Three options are 
available. The thread could: try again (busy wait), 
yield and try again (giving other threads a chance 
to execute), or convert the thin lock to a thick one 
(Case 6). We are investigating various combina- 
tions of these three options. 

4. Object not owned — thick lock in place. We ac- 
quire the mutex for the thick lock, check that the 
lock is still associated with the object, store the 
thread index (Tl) register in the ownerld field, and 
release the mutex. By the time the mulex has been 
acquired, it is possible that the thick lock has been 
unlocked and even disassociated from the object, 
in this extremely rare case, the thread starts over 
trying to acquire the monitor for the object. 

5. Object owned by this thread — thick lock in place. 
Wc bump the recurs ionCount. Synch ronizatior. is 
not needed since only the thread that owns a thick 
lock can access its recursionCount field (or release 
the lock). 

6. Object owned by iiJiother i;:read — inick lock Wi 
place. We acquire the mutex, check that the lock 

is still associated with the appropriate object, an d 
yield to the enteringQueue, releasing the mulex at 
the same time. 

We are exploring two issues associated with re- 
leasing a monitor: what to do with threads on the 
enteringQueue when a thick lock is unlocked, and 
when to disassociate a thick lock from an object. 

Memory management subsystem. Of all of the Java 
language features, automatic garbage collection is 
perhaps the most useful and the most challenging 
to implement efficiently. There are many approaches 
to automatic memory management, no one of 
which is clearly superior in a server environment. 
Jalapeno is designed to support a family of inter- 
changeable memory managers. Currently, each man- 
ager consists of a concurrent object allocator and a 
stop-the-world, parallel, type-accurate garbage col- 
lector. The four major t>^pes of managers supported 
arc: copying, noncopying. C'.;:. . .=::r'i).!K' '{-'vIjk;, :; d 
ge r. c r a t i o n al n <i n co py i n g . 

Conciurcn! oiyject aUocaiion. J;tiapeho's niemors' 
managers partition heap memor\' into a lafi^c-ohjcci 
space and a small-object space. Each manager uses 
a noncopying large-object space, managed as a se- 
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quence of pages. Requests for large objects are sat- 
isfied on a first-fit basis. After a garbage collection 
event, adiaccn: l pages are coalesceiL 

I'o support concurrent allocation of small objc^ is by 
copying managers, each virtual processor maintains 



Jalapeno supports a family 
of memory managers 
for object allocation 
and garbage collection. 



a large chunk of local space from which objects can 
be allocated without requiring synchronization. 
( rhesc local chunk:- are not logically scp jmrc from 
the global heap: an object allocated b\ one virtual 
processor is accessible from any virtual processor that 
gets a reference to it.) Allocation is performed by 
incremeriting a space pointer by the required size 
and comparing the result to the limit of the local 
ciiunk. If the comparison fails (not the normal case), 
the allocator atoniicaily obtains a new local chunk 
from the shared global space. This technique works 
without locking unless a new chunk is required. The 
cost of maintaining local chunks is that memory frag- 
mentation is increased slightly, since each chunk may 
not be filled completely. 

Noncopying managers divide the small-object heap 
into fixed-size blocks (currently 1 6 kilobytes). Each 
block is dynamically subdivided into fixed-size slots. 
The number of these sizes (currently 12), and their 
values, are build-time constants that can be tuned 
to fit an observed distribution of small-object sizes. 
When an allocator receives a request for space, it 
determines the size of the smallest slot that will sat- 
isfy the request, and obtains the current block for 
that size. To avoid locking overhead, each virtual pro- 
cessor maintains a local current block of each size. 
If the current block is full (not the normal case), it 
m;ikcs ihe jioxt block tor thai size wi:'. ; . . 

able the current block. If all such blocks are lull ('.:ven 
iiv-re rare), it obtains a block lunw the siKired ; 
and makes the newly obtained block curreni. Since 
the block sizes and the number of sloi sizes are rel- 
aiively small, the space impact of replicating the cur- 
rent blocks for each virtual processor is insignificant. 



From mutation to collection. Each virtual processor 
has a collector thread associated u iih it. .falapeno 
operates in ov.^ of two ii'o^'c:-: el' her !!\e n\v-:i\K< 
(riormai ihre:. . .;7C n;::);; ine collccTion 

threads are idle, or the mutators are idle and the col- 
lection threads are running. Garbage collection is 
triggered when a mutator explicitly requests it, when 
a mutator makes a request for space that the allo- 
cator cannot satisfy, or when the amount of avail- 
able memor}' drops below a predefined threshold. 

Scalability requires that the transition between 
modes be accomplished as expeditiously as possible. 
During mutation, all collector threads are in a wait- 
ing state. When a collection is requested, the col- 
lector threads are notified and scheduled (normal- 
ly, as the next thread to execute) on their virtual 
processors. When a collector thread starts execut- 
ing, it disables thread switching on its virtual pro- 
cessor, lets the other collector threads know it has 
control of its virtual processor, performs some ini- 
ti.ilization, and synchronizes with the other collec- 
tors (at the first rendezvous point, described later). 
When each collector knows thar the others arc ex- 
ecuting, the transition is comp]etc. 

Note that when all the collector threads are running, 

all the mutators must be at yield points, it is not jiec- 
essar}' to re dispatch any previously pending muta- 
tor thread to reach this point. When the number of 
mutator threads is large, this could be an important 
performance consideration. Since all yield points in 
Jalaperio are safe points, the collector threads may 
now proceed with collection. 

After the collection has completed, the collector 
threads re-enable thread switching on their virtual 
processors and then wait for the next collection. Mu- 
tator threads start up automatically as the coUector 
threads release their virtual processors. 

Parallel garbage collection. Jalapefio's garbage col- 
lectors are designed to execute in parallel. Collec- 
tor threads synchronize among themselves at the end 
of each of three phases. For this purpose Jalapeno 
provides a rend ez\'ous mechanism whereby no thread 
proceeds past the rendezvous point until all have 
K ai. hcc h 

In xhciniUulizoi:-',^ phaso.^i copying colle./ior thrviid 
copies its own liircad object and its virtual-proces- 
sor object. This ensures that iipd;ites to these objects 
are made to the new copv and not to the old copy, 
which will be discarded after the collection. 
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The noncopying managers associate with each block 
of nicnidn' a mark array and an allocation arrav.with 
'"^^.c entry in each army for each sici. During inilial- 
I. lion. all mark arr;.y entries are set lo 7.'.:io. Ail c..-- 
Icctor threads participate in this initialization. 

In the root identification and scan phase, all collec- 
tors behave similarly. Collector threads contend for 
the JTOC and for each mutator thread stack, scan- 
ning them in parallel for roots (that is, object ref- 
erences conceptually outside the heap), which are 
marked and placed on a work queue. Then the ob- 
jects accessible from the work queue are marked. 
The marking operation is synchronized so exactly one 
collector marks each live object. As part of marking 
an object, a copying collector will copy its bits into 
the new space and overwrite the sx:nvs word of the 
old copy with a forwarding pointer to the new copy. 
(One of the low-order bits of this pointer is set to 
indicate that the object has been forwarded.) 

Roots in ihc JTOC arc identified by cv^mini:-;^ i; e 
coindexed descriptor array that identifies the type 
of each entry. Roots in a thread stack are identified 
by analy7:ing the method associated v ith each stack 
frame. Specifically, the local data area will have ariv 
of the stack frame's ordinary roots; the parameter 
spill area may have roots for the next (called) meth- 
od's stack frame; the nonvolatile register save area 
might contain roots from some earlier stack frame. 
Roots are located by examining the compiler-built 
reference maps that correspond to the methods on 
the stack and tracking which stack frames save which 
nonvolatile registers. 

The global work queue is implemented in virtual- 
processor-local chunks to avoid excessive synchro- 
nization. An object removed from the work queue 
is scanned for object references. (The offsets of these 
references arc obtained from the class object that 
is the first entr>' in the object's TIB.) For each such 
reference, the collector tries to mark the object, if 
it succeeds, it adds the object to the work queue. In 
the copying collectors the marking (whether it suc- 
ceeds or fails) returns the new address of the ref- 
erenced object- 

iu TDc (:()inpleii')fi piiasc. c.-jAing c<)iiecior<: simplv 
rr\ crse ihc ^ciim: oi ihc occupied find ;<v:ji];i!)ic por- 
fu)iis of I he heap. Collecior threads obtain local 
chtmks from the now enipt>' "nurserv' in prepara- 
tion for the next mutator cycle, A noncopyini: col- 
lector thread performs the following steps: 



• If this was a minor collection by the generation col- 
lector, mark all old objects as iivc (identified fro.r?i 
tlic current aJlocatior. arrays). 

' ^'--'rk arrays lo iind Ive. blocks, and re- 

turn iliem to the free block list. 

• For all blocks not free, exchange mark and allo- 
cation arrays: the unmarked entries in the old mark 
array identify slots available for allocation. 

Perfomiance issues. We are actively investigating both 
noncopying and copying memor^^ managers to un- 
derstand more fully the circumstances under which 
each is to be preferred and to explore possibilities 
for hybrid solutions. (The noncopying large-object 
space is an example of a hybrid solution.) The ma- 
jor advantages of a copying memory manager lie in 
the speed of object allocation, and the compaction 
of the heap performed during coneciion (providing 
better cache performance). The major advantages 
of a noncopying memoiy manager lie in faster col- 
lection (objects are not copied), better use of avail- 
able space (copying managers waste half this spac:), 
and simpler interaction between mutators and col- 
lectors. (The optimizing compiler will be able to pur- 
sue more aggressive optimizations, if it does not hsvc 
to be concerned that objects might move at evci 
safe point.) A system with a copying manager would 
Rin overall faster betsveen collections; a system v. lih 
a noncopying manager would otfer smaller pause 
times. 

A noncopying policy would greatly simplify a con- 
cunent memoiy manager (one in which mutators and 
collectors run at the same time): it would ehminare 
the need for a read barrier and simplify the write 
barrier. 

Compiler subsystem. Jalapeno executes Java b>ie- 
codes by compiling them to machine instructions at 
run time. Three different, but compatible, compil- 
ers are in use or under development. Development 
of Jalapeno depended upon early availability of a 
transparently correct compiler. This is the role of 
Jalapeno's/?i25e///ie compiler. However, by construc- 
tion, it does not generate high-performance target 
code. 

To ohrain ljigh-.)uality ni.ichine ct..ic k.; .^ N 
that ;irc observe*! to be coni!^uT:!T!<Mi;illv inte!l^:^ 

opiinitini* cQmp\\^^^ ihed in the :• 

section) applies traditional static compiler optinw- 
zations as well as a numbe r of new imizations that 
are specific to the dynamic Java context. The cost 
of running the optimizing: compiler is too hiiih for 
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it to be profitably employed on methods that are only 

infrcLiiicnilv executed. 

j:il;ip:: ■,o"sw7.;r/; ;;.M-s.pilcr will conjpilc each meli v:»(J 
as it ext viJics iov ihc first time. It balances compile- 
timc and run-time costs by applying a few highly 
cJFective optimizations. Register allocation is the 
most important of these because the PowerPC has 
generous (32 fixed, 32 floating) register resources and 
most register operations are one cycle, while stor- 
age references may require several (sometimes very 
many) cycles. 

The quick compiler tries to limit compile time by an 
overall approach of minimal transfonnation, efficient 

data structures, and few passes over the source and 
derived data. The source bytccode is not translated 
to an intermediate representation. Instead, the byte- 
code is "decorated," with the results of analysis and 
optimization, in objects related to each bytecode in- 
struction. Optimizations performed include copy 
propagation to eliminate temporaries introduced by 
the stack-based nature of Java bytecode. The quick 
compiler's primary register allocator uses a graph col- 
oring algorithm. Coloring is not appropriate (due 
lo long coiTipilc lime) for some methods (iong one- 
basic-block static initiab'zers that need many s^mboiic 
registers, for example). For such methods, the quick 
compiler lias a simpler, faster algorithm. We will in- 
vestigate heuristics to detect these cases. We also 
plan to add inline compilation of short methods that 
are final, static, or constructors and to explore local- 
context (peephole) optimizations. 

The code produced by all three compilers must sat- 
isfy J alapeiio^s calling and preemption conventions. 
They ensure that threads executing the methods they 
compile will respond in a timely manner to attempts 
to preempt them. Currently, explicit yield points are 
compiled into method prologues. Eventually, yield 
points will be needed on the "back edges" of loops 
that cannot be shown to contain other yield points. 

The compilers are also responsible for maintaining 
tables that support exception handling and that al- 
low the memon' managers to find object references 
on thread stacks. (These tables are also used by Jala- 
Jo!Mii^.::er.) When a garbage ct^lLerion c^W 
-i NCN place, each of the melhoits represented on i 'le 
. ;i; • Niael: V. j; be a; a gjihage colleciioii .Viv/i' /;,•>./?,•. 
>.ale poinis are ihe call sites, dynamic link sites, 
tlireiKl yield siies, possible excepnon-tlirow sites, and 
allocation request sites. For any given safe point 
WMtliin a method body, the compiler thai created the 



method body must be able to describe where the live 
references exist. A reference map identifies, for each 
safe point, the locations of object references. 

We have not yet implemented a comprehensive strat- 
egy to select compilers for methods. Switching from 
the quick to the optimizing compiler will be done 
based on run-time profiling in the manner of Self, 

A dynamic optimizing compiler 

We anticipate that the bulk of the computation on 
a Java application will involve only a fraction of the 
Java source code. Jalapeno's optimizing compiler is 
intended to ensure that these bytecodes are compiled 
efficiently. The optimizing compiler is dynamic: it 
compiles methods while an application is running. 
In the future, the optimizing compiler will also be 
adaptive: it will be invoked automatically on com- 
putationally intensive methods. The goal of the op- 
timizing compiler is to generate the best possible 
code for the selected methods on a given com pile- 
time budget. In addition, its optimizations must de- 
liver significant performance improvements while 
correctly preser\'ing Java's semantics for exceptions, 
garbage collection, and threads. Rediicijig the cost 
of synchronization and other thread primitives is es- 
pecially important for achieving scalable perfor- 
mance on SMP servers. Finally, it should be possible 
to retarget the optimizing compiler to a variety of 
hardware platforms with minimal effort. Building a 
dynamic optimizing compiler that achieves these 
goals is a major challenge. 

This section provides an overview of the Jalapeiio 
optimizing compiler; further details are available 
elsewhere.'^*' The optimizing compiler's structure 
is shown in Figure 4. 

From bytecode to intermediate representations. The 

optimizing compiler begins by translating Java byte- 
codes lo^ high-level intermediate representation (HIR). 
This is one of three register-based intermediate rep- 
resentations that share a common implementation. 

(Register-based representations provide greater flex- 
ibility for code motion and code transformation tlian 
do representations based on trees or stacks. I^hc}' 
;?!so ii-Iow .i c)«;scr (it tx/ ( h . ■ - . - 
peii<Vs TariHM arcniieciu.". j \ t.-'i' «n..jiujisoi tnere in- 
terni<.'iiiale representations //-nipics: aji 0!}cni- 
lor and zero or more opera ruts. Most operands 
represent symbolic registers, but they can also rep- 
resent physical registers, memon' locations, con- 
stants, branch targets, or types. Java's type structtire 
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Figure 4 Jalapeno*s optimizing compiler 
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is reflected in these intermediate representations: 
there are distinct operators for similar operations 
on different primitive types, and operands carry type 
information.^^ Instructions are grouped into ex- 
tended basic blocks that are not terminated by 
method calls or by instructions that might cause an 
exception to be thrown. (Extra care is required when 
performing data flow analysis or code motion on 
these extended basic blocks.)^''-*' These intermedi- 
ate representations also include space for the cach- 
ing of such optional auxiliary' infoniiation as rcach- 

cncouihL:> »M uK>|i-iicM iiii: >ucjc;iiri,. 

The iraiislation process discovers liie cxiciuicd-biisic- 
block srruciurc of a method, consiructs an exception 
tabic for the method, and creates HIR instructions 
for bytccodcs. It discovers and encodes type infor- 



mation that can be used in subsequent optimizations 
and that will be required for reference maps. Cer- 
tain simple "on-the-fiy" optimizations — copy prop- 
agation, constant propagation, register renaming for 
local variables, dead-code elimination, etc. — are also 
performed. (Even though more extensive versions 
of these optimizations are performed in later opti- 
mization phases, it is worthwhile to perform them 
here because doing so reduces the size of the gen- 
erated UlR and hence subsequent compile time.) In 
addition, suitably shon final or static methods are 
y ■ in 'Inc. 

Copy propagation is an example of an on-the-tiv o - 
timization perlormcd during the translation. Java 
bytecodes often contain instruction sec|uences thai 
perform a calculation and store the result into a lo- 
cal variable. A naive approach to intermediate rcp- 
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resentation generation results in the creation of a 
temporary register for the result of the calculation 
and a:i Mfiditi-inal in: tr^jction to n;ovc ilie value of 
this 1. .; ivT iiiio the iojal \';;ri;.;;:l-j. A sinir;^*: c?^; )• 
propagation heuristic eliminates many of these un- 
necessary temporary registers. When storing from 
a temporary into a local variable, the most recently 
generated instruction is inspected. If this instruction 
created the temporary to store its result, it is mod- 
ified to write this result directly to the local variable 
instead. 

Translation proceeds by abstract interpretation of the 
bytecodes. The types (and values if known at com- 
pile time) of local variables and the entries on the 
execution stack as defined by the Java Virtual Ma- 
chine Specification^ form the symbolic state of the 
abstract interpretation. (Because these types are not 
statically available from Java bytecodes, all Jalape- 
no's compilers must, in effect, track this symbolic 
state.) Abstract interpretation of a bytecode involves 
generating the appropriate HIR instruction(s) and up- 
dating the symbolic state. 

The main loop of the translation algorithm uses a 
work list containing blocks of code with iheir starl- 
ing symbolic states. Initially, this work list contains 
the entries for code beginning at b)lecode 0 and for 
each of the method's exception handlers (with empt>' 
symbolic states). Code blocks are successively re- 
moved from the work list and interpreted as if they 
were extended basic blocks. If a branch is encoun- 
tered, the block is split and pieces of it are added 
to the work list. (At control-flow join points, the val- 
ues of stack operands may differ on different incom- 
ing edges, but the types of these operands must 
match. An element-wise me^/ operation is used on 
the stack operands to update the symbolic state at 
these points. *') If a branch is for\N'ard, the piece from 
the beginning of the block to the branch is tenta- 
tively treated as a completed extended basic block. 
The pieces from the branch to its target and from 
the target to the end of the block are added to the 
work list. If the branch is backward, the piece from 
the branch to the end of the block is added to the 
work list. If the target of a backward branch is in tlie 
middle of an already-generated extended basic block. 
,u;.. 1.1. .1. ^^^K- ,t tht' tiirgct point. If :!ie stat:)' is 
y.'M CHii-iv rti lije uiigci point. The block must bo r-^- 
genera-eti because iis siari si:.iie may i)<. irtcorrci ■ . 

To minimize the ntimber of limes HIK is generiUcd 
for the same bytecodes, a simple greedy algorithm 
selects the block with the lowest starling bytecode 



index for abstract interpretation. This simple heu- 
ristic relics on the fact that, except for loops, all con- 
trol-How coristructs arc gci^eriucd in lOiX'i-ogi ' ! o. - 
dcr. h:; : i:;;it \.hc coiuro! il -y gn;p:i is rcdu^iijic. 
Foriuiiously, this heuristic seems to obtain optimal 
extended-basic-block orderings for methods com- 
piled with current Java source compilers. 

High-level optimization. The instructions in the HiR 
are closely patterned after Java bytecodes, with two 
important differences — mR instructions operate on 
symbolic register operands instead of an implicit 
stack, and the HIR contains separate operators to im- 
plement explicit checks for run-time exceptions (e.g., 
array-bounds checks). The same run-time check can 
often cover more than one instruction. (For exam- 
ple, incrementing A[/] may involve two separate ar- 
ray accesses, but requires only a single bounds check.) 
Optimization of these check instructions reduces ex- 
ecution time and facilitates additional optimization. 

Currently, simple optimization algorithms with mod- 
est compile-time overheads are performed on the 
HIR. These optimizations fall into three classes: 

1. Local opii)nizations. These optimizations are lo- 
cal to an extended basic block, e.g.. common sub- 
expression elimination, elimination of redund;int 
exception checks, and redundant load eliminiiiion. 

2. Flow-insetvsitive optimizations. To optimize across 
basic blocks, the Java Virtual Machine Specifi- 
cation assurance that "every variable in a Java pro- 
gram must have a value before it is used"^ is ex- 
ploited. If a variable is only defined once, then 
that definition reaches every use. For such var- 
iables, "def-use" chains are built, copy propaga- 
tion performed, and dead code eliminated with- 
out any expensive control-flow or data-flow 
analyses. Additionally, the compiler performs a 
conservative flow-insensitive escape analysis for 
scalar replacement of aggregates and semantic ex- 
pansion transformations of calls to standard Java 
library methods. 

This technique catclies many optimization oppor- 
tunities, but other cases can only be detected by 

3. Jn-linc (wptJfision nf niei/ioJ calls. To expiJiV ' a 
n^ethod call in line at the IHR level, the lilK lor 
the called nietiiod is generated and patched into 
the HIR of the caller. A static size-based heuristic 
is currently used to control automatic in-line ex- 
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pansion of calls to static and final methods. For 
nonfinal virtual metliod calls, the optimizing com- 
piler predicts the rec jix'cr of a virtual call -o be 
li^e declared type o; . .cohjeci. It i:.;!ards eiui^ in- 
line virtual method v. ,ih a run-limc test to verify 
that the receiver is predicted correctly, and to de- 
fault to a normal virtual method invocation if it 
is not. This run-time test is safe in the presence 
of dynamic class loading. 

Since Jalapeno is written in Java, the same frame- 
work used to expand application methods in line can 
also be used to expand calls to run-time methods in 
line (notably for synchronization and for object al- 
location). In general, it is possible to expand calls in 
line all the way from the application code through 
Java libraries down to the Jalapeno run-time system, 
providing excellent opportunities for optimization. 

Low-level optimization. After high-level analyses 
and optimizations have been performed, HIR is con- 
verted to ^.low-levelintennediate representation (UR). 
The UR expands HiR instructions into operations that 
are specific to the Jalapeno virtual machine's object 
layout and parameter-passing conventions. For ex- 
an:iplc, virtual menjod invocation is expressed as a 
single HIR instruction analogous to the invokevirtual 
b^tecode. This single HiR instruction is converted into 
three UR instructions that obtain the riB pointer 
from an object, obtain the address of the appropri- 
ate method body from the TIB, and transfer control 
to the method body. 

Since field and header offsets are now available as 
constants, new opportunities for optimization are ex- 
posed. In principle, any high-level optimization could 
also be performed on the UR. However, since the 
UR can be two to three times larger than the cor- 
responding HIR, more attention needs to be paid to 
compile-time overhead when performing UR opti- 
mizations. Currently, local common subexpression 
elimination is the only optimization performed on 
UR. Since HIR and UR share the same infrastruc- 
ture, the code that performs common subexpression 
elimination on HIR can be reused without modifi- 
cation on UR. 

Also, as the hist stc'-^ Inv -i,.^;,.] i^^-^^^v^z'^\\'-^. :\ d:- 
pcndcncc ^rapli is cr^n^u uv i^u .vn ^.\^\\ ^^Ait nuuit 
sic block. ' riie d^ ;c;i Jcrii L^ g^''*ph is used for insir uc- 
lion selection (see r:evi subsection) . LLach node of 
the dependence graph is an UK instruction, and each 
edge corresponds lo a dependence constraint be- 
tween apair of instructions. Edges are buiU for true. 



anti, and output dependences for both registers and 
memory'. Control, synchronization, and exception de- 
pendcTiCx^ edges arc also built. Synchronization con- 
straints are modeled by W^'sXodyiclngsynchronizivion 
dependence edges between synchronization opera- 
tions (monrtor^enter and monitor_exit) and memory 
operations. These edges prevent code motion of 
memory operations across synchronization points. 
Java exception semantics^ is modeled by exception 
dependence edges connecting different exception 
points in an extended basic block. Exception depen- 
dence edges are also added between these excep- 
tion points and register write operations of local var- 
iables that are "live" in exception handler blocks, if 
there are any in the method. This precise modeling 
of dependence constraints enables aggressive code 
reordering in the next optimization phase. 

Instruction selection and machine-specific optimi- 
zation. Alter low-level optimization, the UR is con- 
verted to machine-specific intennediate representation 
(MIR). The current MIR reflects the PowerPC archi- 
tecture. (Additional sets of MIR instructions can be 
introduced if Jalapeno is ported to different archi- 
tectures.) Die dependence graphs for the exteriCcd 
basic blocks of a method are partitioned into trees. 
These are fed to a botiom-up rewriting sysicni 
(BURS),-^ which produces the MIR. Then symbolic 
registers are mapped to physical registers. Aprt)/o^'w^ 
is added at the beginning, and an epilogue at the end, 
of each method. Finally, executable code is emitted. 

BURS is a code-generator generator, analogous to 
scanner and parser generators. Instruction selection 
for a desired target architecture is specified by a/ree 
grammar. Each rule in the tree grammar has an as- 
sociated cost (reflecting the size of instructions gen- 
erated and their expected cycle times) and code-gen- 
eration action. The tree grammar is processed to 
generate a set of tables that drive the instruction se- 
lection phase at compile time. 

There are Uvo key advantages of using BURS tech- 
nology for instruction selection. First, the tree-pat- 
tern matching performed at compile time uses dy- 
n--^ -r r^o^r:' ■ "ir;: r^ . 'InH a least-cost parse (v. :rh 
rc>.pcLi U) lilt. ^.<i_st.> >j;cctijcu in the tree graninirtr) 
for ;my input iiec. Second, the cost of buildiriii VnC 
HUKS infrastructure can be amortized over several 
target architectures. The architecture-specific com- 
ponent is relatively short; Jalaperio's PowerPC tree 
grammar is about 300 rules. 
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Tlie tree-pattern matching in BURS was originally de- 
veloped for code generation from tree-based inter- 
TRHiiatc rcprcscni;.tions, usually in ihc absence of 
i::ob;ij OiUii;iiza;joi;s. i'revious approi-v.hjs to p.irii- 
tioning directed acyclic graphs for tree-pattern 
matching considered only graphs containing register- 
true -dependence edges. Our approach is more gen- 




The optimizing compiler's 
front end is independent 
of Jalapeno's object layout 
and calling conventions. 



era] because it considers partitioning in the presence 
of both register and nonregister dependences. The 
legality constraints for this partitioning are non- 
trivial.^ 

After the MiR is constructed, live variable anal3'sis 
is perfonTied to determine the live ranges of svni- 
bolic registers and ;hc stack variables that hold ob- 
ject references at garbage-collection-safe points. The 
standard live variable analysis has been modified 
to handle the extended basic blocks of the factored 
control flow graph as described by Choi, et al. 

Next, the optimizing compiler employs the linear scan 
global register-allocation algorithm^ to assign phys- 
ical machine registers to symbolic MIR registers. This 
algorithm is not based on graph coloring, but greed- 
ily allocates physical to symbolic registers in a single 
linear time scan of the symbolic registers' live ranges. 
This algorithm is several times faster than graph col- 
oring algorithms and results in code that is almost 
as efficient. More sophisticated (and more costly) 
register allocation algorithms will eventually be used 
at higher levels of optimization (see next subsection). 
( The irony of currently using a more expensive al- 
gorithm in the quick compiler than in the optimiz- 

A HiJihoti prologijc ;i!j<.ivir.s... a slack i-ame. >: -.cn 
any iionvolarilc regisiersneccicd by the inethod. and 
checks to sec if a yield has been requested. The ep- 
ilogue restores any saved registers and deallocates 
the stack frame. If the method is synchronized, the 



prologue locks, and the epilogue unlocks, the indi- 
cated object. 

The opii/iiizing compiler h, jii emii;- I j.. v execut- 
able code into the array of ints that is the method 
body. This assembly phase also finalizes the excep- 
tion table and the reference map of the instruction 
array by converting intermediate-instruction offsets 
into machine-code offsets. 

Levels of optimization. The optimizing compiler can 
operate at different levels of optimization. Each level 
encompasses all the optimizations at the previous 
levels and some additional ones. Level 1 contains ex- 
actly the optimizations described above. (Primarily 
for debugging purposes, there is Level 0, which is 
like Level 1 without any high-level or low-level op- 
timizations.) Two levels of more aggressive optimi- 
zation are planned. 

Level 2 optimizations will include code specializa- 
tion, intraprocedural flow-sensitive optimizations 
based on static single assignment (SSA) form (both 
scalar^' and array^) sophisticated register alloca- 
tion, and instruction scheduling. Instruction sched- 
uling is currently being implemented. It uses an >vj iR 
dependence graph built with the same code that 
builds the LIR dependence graph used by BUFS. 

Level 3 optimizations will include interprocedural 
analysis and optimizations. Currently, interproce- 
dural escape analysis and interprocedural optimi- 
zation of register saves and restores*^ are being im- 
plemented. 

Modalities of operation. The optimizing compiler's 
front end (translation to HIR and high-level optimi- 
zations) is independent of Jalapeilo's object layout 
and calling conventions. This front end is being used 
in a bytecode optimization project.^ 

The intended mode of operation for the optimizing 
compiler is as a component of an adaptive Jvm. Fig- 
ure 5 shows the overall design of such a virtual ma- 
chine. The optimizing compiler is the key constit- 
uent of JalapenoS: adaptive optimization system, 
which will also include on-line measurement and con- 
troiicr stii^systemsi:;irreni] . :-.:dcn!c.i.'. T]:c 
on-line nieasuremer.i suhsvsreni will niOjiii^^rthe per- 
tbr;n:j!;rc <)t iniiiv;<^i:)l niv.'i;^>(l> iisin:^ r-v: h solu^-;rc 
samphiii: and proliiaig tccb.nicjucs and i?itormation 
from a hardware performance monitor. I'he control- 
ler subsystem will ho invoked when the on-line mea- 
surement subsystem detects that a certain perfor- 
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Figure 5 The optimizing compiler in an adaptive Jvm 
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mance threshold has been reached. The controller 
will use the profiling information to build an "op- 
timization plan'' that describes which methods should 
be compiled and with what optimization levels. The 
optimizing compiler will then be invoked to compile 
methods in accordance with the optimization plan. 
The on-line measurement subsystem can continue 
monitoring individual methods, including those al- 
ready optimized, to trigger further optimization 
passes as needed. 

In addition to the dynamic compilation mode de- 
scribed above, the optimizing compiler can be used 
as a static compiler as shown in Figure 6. In this 
mode, the optimized code generated by the optimiz- 
ing compiler is stored in the boot image (see Ap- 
pends B). The optimized compilation is performed 
off line, prior to execution of the Jalapefio virtual 
machine. (Eventually, we hope to be able to com- 
bine both modes. An application would run for a 
while. The adaptive optimization system would op- 
timize the Jvm for that application. Finally, this op- 
timized Jvm would get written out as a boot image 
specialized for the application.) 

The optiiijizini] conipiler cin aiso be used as a Jl l 
- compiler compiling al) met'nods the lirst time they 
arc executed. When benchmarking the performance 
of the optimizing compiler, it is used both as a static 
boot-image compiler (for Jvm code in the boot im- 



Figure 6 The optimizing compiler as a static compiler 
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age) and as a jrr compiler (for the benchmark code 
Currenl status 

The core functionality required to implement all Java 
language features is all but complete. Some of the 
more esoteric thread function — suspend, resume. 
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Figure 7 Compilation speeds 
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timed wait, etc. — have yet to be implemented. The 
load-balancing algorithm is rudimentary. Support for 
finalization, weak references, and class verification 
is not yet in place. ITie quick compiler is nearing com- 
pletion. The basic framework of the optimizing com- 
piler and some of its Level 1 optimizations are now 
up and running. More advanced optimizations are 
being developed. The on-line measurement and con- 
troller subsystems are in the design stage. 

Jalapeno's support for Java library code is limited 
by the fact that Jalapeno is written in Java code. Jala- 
peiio can handle library methods written in Java 
code, but native methods must be rewritten. Imple- 
menting the Java jNativc Interface (jni) will allow 
Jalapcfio to call native methods written to that in- 

lani:v;.i:;c ijiTcrlacc -i) a vim};:! maciiinc tn;ii, in x]\c 
case ol Jalapeno, is noi wriiicn in C. We do not vci 
understand the periormance or implementation is- 
sues that will arise when we attempt lo provide JNl 
services in Jalapeno. 



'Hie Jalapeno project is in transition. The initial func- 
tion is mostly in place. Many of Jalapeiio's mech- 
ai^isni^ .;rc snll rudii^^ontary. Jr is time ro n^oasurr 
pcrU): m;:'.cc. icici^ih hottlcncc!.>, .;n J icplace them 
with more elticicnt implemcniaiions. Some of the 
"low-hanging fruit" has already been picked: uncon- 
tended lock acquisitions have been moved in line, 
for example. However, the performance measure- 
ments of baseline compiled code were so inconclu- 
sive that we have been reluctant to trust our mea- 
surements until the optimizing compiler was 
available. 

There are also bugs, of both function and perfor- 
mance, to be isolated, identified, and fixed. 

In trying to assess the current performance of Jala- 
peiio, it is useful to make comparisons with the Jvm 
in the IBM Developer Kit (dk) for AIX, Java Tech- 
nology Edition, Version 1.1.8 that uses the JIT com- 
piler developed in IBM Tokyo. It should be noted 
that while Jalapeno has the luxury of being targeted 
to SMP servers, the IBM Jvm must accommodate all 
PowerPC computers running AIX. The reader should 
also keep in mind that the performance figures 
quoted here represent a snapshot in time: both J:^la- 
peiio and the IBM Jvm are constantly being improved. 

Performance figures are given for Jalapeno's base- 
line and optimizing compilers. In both cases, the boot 
image has been compiled with the optimizing com- 
piler, and the indicated compiler is used primarily 
for the indicated application (and for any dynam- 
ically linked classes of the Jvm). The optimizing com- 
piler figures reflect currently implemented Level 1 
optimizations. A nongenerational copying memory 
manager is used in both cases. 

Figure 7 compares the time spent in compilation by 
Jalapeno's baseline and optimizing compilers (writ- 
ten in the Java language and optimized by Jalapeno's 
optimizing compiler) and the IBM DK JIT compiler 
(implemented in native code). The baseline compiler 
is the clear winner, running 30 to 45 times faster than 
the JIT compiler. The optimizing compiler is nearly 
as fast as the JIT compiler, but not quite. 

Figure 8 compares the pcrfori^iKnicc of c ' ^ ; r-^ 
duce(] hvthe three conipi'crs to intcrprctco code nv 
ihc !.'K withoiii a JI T con'il'Jiler on micmi^oncl; 
marKs i:.:.m Symanioc.-*- (The graph has been tni Tr- 
eated to facilitate comparison of the performance of 
Jalapeno s optimizing compiler and the IBM DK Ji l 
compiler.) ITie baseline compiled code is consistently 



228 ALPERN ET AL, 



IBM SYSTEMS JOURNAL. VOL 39. NO 1, 2000 



Figure 8 Symantec microbenchmarks (166 MHz PowerPC 604e, AIX 4.3, copying garbage collector, Level 1 
optimization); each measurement Is an average of 10 runs 
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twice as fast as inteq:)reted code. Hie IBM Jl f-compilcd 
code is much better: between four and 40 times faster 
than the interpreted code. Jalapcno's optimizing 
compiler is roughly competitive with the JIT com- 
piler. 

Figure 9 makes the same comparison on the 
SPECjvm98 benchmarks'^ run on the medium (10 
percent) problem size. ^ Again the baseline compiler 
is usually about twice as good as the interpreter. 
Again the jit compiler is much better. Again the 
optimizing compiler is usually competitive with the 
JIT compiler. 

Figure 10 shows the performance of the Jalapeno 
optimizing compiler running 12 virtual processors 
on a 12-way SMV (with 262 MHz PowerPC S7a pro- 
cessors running AlX 4.3) using the portable business 
ohi<xt benchmark (pl30B v 2.0a). This benchmark 
v;>-- Layioretal. " inihis issue for details), modeled 
alter the H'C C*'^^ spccific;j:ion. mimics ihc business 
logic in a uansactional workload- Pcrtorniancc im- 
proves almost linearly to 10 warehouses, peaks at 13, 
and then degrades very slowly. Tiiis shows that, on 
this benchmark at least, Jalapeno scales very well. 



Reloted work 

Implementing a Java virtual machine and its related 
subsystems (including the optimizing compiler) in 
Java code presents several challenges. Taivalsaari"*^ 
also describes a "Java in Java" Jvm implementation 
designed to examine the feasibility of a high quality 
virtual machme written in Java. One drawback of this 
approach is that it runs on another Jvm, which adds 
performance overhead because of the two-level in- 
terpretation process. The Rivet Jvm'^'^ from Mix 
(Massachusetts Institute of Technology) also runs 
on top of another Jvm. Our approach avoids the need 
for another Jvm by bootstrapping the system (see 
Appendix B). The Jvm of IBM's VisualAge* for Java ^ 
is written in Smalltalk, Other Jvms"^^"^"^ are written 
in native code. 

Perhaps th'" most exciti ' .-li ' ■" ' 

l iOiojiul. *' i he Ot.<jcv,l IViULieLs ol i kOLOpoi aiiU 

pcuo arc somewhat similar: objecis are icfercncc.^ 
directly (rather than through handles) and objec\^ 
have a two-word header. In both models, informa- 
tion about the object's class is available through a 
reference in the object header. 
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Figure 9 SPECjvm98 benchmarks (medium size) 
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HotSpot initially interprets bytecodes, compiling 
(and moving in line) frequently called methods. Jala- 
peno's quick compiler will play a role similar to 
HotSpot's interpreter. All else being equal, this 
should give a start-up advantage to HotSpot and a 
performance advantage to Jalapeno. We do not ex- 
pect either advantage to be dramatic, but this remains 
to be seen. If unoptimized Jalapeno code performs 
better than interpreted HotSpot code, this will al- 
low the Jalapeno optimizing compiler to focus more 
resources on the code that it optimizes. Implement- 
ing Jalapeno in Java code allows the optimizing com- 
piler to move in line and optimize frequently called 
run-time services that HotSpot accesses through calls 
to native methods (heavily optimized C routines). 

HotSpot implements Java threads as host operating 

<;v<tcr-»i thrC'^Hs 'fh'*' thre:,*d:^ arc ^n'-'' i^'OO:"'.- ' ••• 

Jaia|/v,ii<j N^^liCutjiv, - il^ owii i^uaSi -prcCiii j.>iivC 

ihrcacis. W'c cxpci : ''.a*, ilus will ;i!lu\v NUppoi for 
more threads, ligij.cr-wcighi synchronization, and 
smoother transition from normal operation to gar 
bagc collection (especially in the presence of a large 
number of threads). HotS pot's per-t bread method 



activation stacks conform to host operating system 
calling conventions. This should give Jalapeno a mi- 
nor space and performance advantage (although 
Jalapeiio will take a performance hit when it does 
call C code). 

Both HotSpot and Jalapeno support type-accurate 
garbage collection- Jalapeno supports a family of 
memory managers. None of Jalapeno's collectors is 
as sophisticated as HotSpot's, but on an SMP Jala- 
peho's collectors run in parallel using all available 
CPUs. HotSpot uses a generational scheme with 
"mark-and-compact" for major collections. To min- 
imize pause times, HotSpot can use an incremental 
"train" collector.'*'* This collector makes frequent 
short collections. Note that this will exacerbate any 
transition-to-collection delays. 

we ao not have information on HotSpofs locking 
nu:C!i;inisms- 

Squeak'*'' is a Smalltalk virtual machine that is writ- 
ten in Smalltalk. It produces a production version 
by translating the virtual machine to C for conipi- 
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lation and linking. The translator is also written in 
Smalltalk. 

Dynamic compiiaiion (called dynamic translation or 
just-in-time compilation) has been a key ingredient 
in a number of previous implementations of object- 
oriented languages. Deutsch and SchifTman's high- 
performance implementation of SmaUtalk-80 
dynamically translated Smalltalk bytecodes to native 
code; ^ their compiler was quite similar to Jalapefio's 
baseline compiler. Implementations of the Self Ian- 
guage also relied on dynamic compilation to achieve 
high performance/^ Self compilers utilized register- 
based intermediate representations that are roughly 
equivalent to the one used by Jalapeno*s optimizing 
compiler. Recently, a number of just-in-time com- 
pilers have been developed for the Java language. -^'-** 
Some of these compilers translate bytecodes to a 
three-address code, perform simple optimizations 
and register allocation, and then generate target ma- 
chine code. 

DAlSY^^ is a VLIW (very long instruction word) em- 
ulator that performs ''on-the-fly" translation of dif- 
ferent architecture instruction sets, including Java 
bN'tecodes, to a vuw architecture. It uses a VLiw tree- 
like representation for instruction scheduling and 
register allocation. 

A number of previous systems have utilized more 
specialized forms of dynamic compilation to selec- 
tively optimize program hot spots by exploiting "run- 
time constants." In general, these systems em- 
phasize extremely fast dynamic compilation, often 
performing extensive off-line precomputations to 
avoid constructing any explicit representation of the 
program fragment being compiled. 

A large collection of work addresses optimizations 
specific to object-oriented languages, such as class 
analysis, both intraprocedural^* and interproce- 
dural,^^ class hierarchy analysis and optimiza- 
tions, ^^-^^ receiver class prediction, method spe- 
cialization, and call graph construction.^^ Other 
optimizations relevant to Java compilation include 
bounds check elimination'*" and semantic expan- 
sion.^'^ 

Conclusions 

J a i a pe ii o is a v i ri u a 1 ni ac h i n e to r J ava se r\ e rs wr i I - 
ten ill the Java programming language. Run-time ser- 
vices, convcniionally supported with native methods, 
are implemented primarily in Java code. 



Figure 1 0 Jalapeno performance for the pBOB v 2.0a 
benchmark on a 12-wey SMP (1-60 threads, 
1 2 virtual processors) 
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Jalapeiio's object layout supports single-instruction 
field access, three-instruction access to array ele- 
ments, hardware null-pointer checks, and four-in- 
struction virtual-method dispatches. Fast access to 
static fields and methods through a global JTOC ar- 
ray is also achieved. 

Jalapeno's threads are multiplexed by virtual pro- 
cessors. Thread switching is quasi-preemptive. Three 
different locking mechanisms provide light-weight 
synchronization without operating system support. 

Jalapeiio's memory management subsystem supports 
a family of memory managers, each consisting of a 
concurrent object allocator and a parallel, type-ac- 
curate, stop-the-world garbage collector. Genera- 
tional and nongenerational, copying and noncopy- 
ing collectors are supported. Incremental and 
concurrent ccilcctors arc hcinii ;r\t:.ti^ . ' 

Jala periods three inlejoperaliic eompi.Lrrs pro^iilc c li - 
ferent levels of dynamic optiniizaiion, ensure tiniciv 
thread precmpiion, and produce tables that support 
exception handling, location of references m stacks, 
and debugging. 
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Jalapeilo's optimizing compiler produces high-qual- 
ity code for methods that have been identified as 
f: cnucntly executed or conipiitalioiially inlf.ns:""e. 
Metliods to he recoivi'.-'ilcil wili he selected ii\ iKiJVi- 
ically based on run-tinic profiling. 

We have established the feasibility of building a vir- 
tual machine for Java servers in the Java lajiguagi:. 
We have not yet demonstrated that such a virtual 
machine can achieve and sustain world-class perfor- 
mance. We are working on it. 

Appendix A: MAGIC 

To allocate an object, Jalapeno's memory managers 
must access raw memory to obtain a piece of avail- 
able space of the required size. They "walk" the 
thread stacks to identify object references in the stack 
frames. A copying manager accesses object headers 
to mark objects during garbage collection and ac- 
cesses raw memory to copy an object Exception han- 
dling requires an unstructured transfer of control to 
the appropriate catch block ("go to" is forbidden in 
the Java language). Static data and methods are ac- 
ccvsed through a dedicated machine register thai 
cannot itself be accessed from Java instructions. In- 
put and output require access to operating system 
services unknown to the Java language. Thread 
£v, itching depends ovi receiving periodic interrupis 
from the operating system. J alapefio's locking mech- 
anisms are implemented using PowerPC instructions 
that cannot be expressed as Java bytecodes. None 
of these operations can be performed without 
breaching Java's programming model. 

To implement Jalapeiio in Java code, it is necessary 
to augment Java's functionality to include capabil- 
ities conventionally required by native methods: 

• To invoke operating system services 

• To use architecture-specific machine instructions 

• To access machine registers and memory 

• To coerce object references to raw addresses and 
vice versa 

• To transfer execution to an arbitrary address 

These capabilities must be granted to Jalapeno, but 
Jalapeno must prevent rhem from becoming av:;:: 
abic to user appllc<iiiv»ii5. 

Juiapeno^s compilers enable such transgressions with 
the help of a special Kf aGIC class. The methods of 
this class correspond to the extra-Java operations 
Jalapeno must be able to perform. The bodies of 



these methods arc empty. Java's source compilers 
can compile them. However, Jalapcno's compilers 
ignore the resulting bytecodes. Father, tliey recog. 
nizc the. name of tiie MAGIC class and insert the nec- 
essary' machine code in line. 1 o make sure that user 
code does not evade Jn\ ri's resti ictions, Jalapeiio's 
compilers will verify, wljcn they encounter a call to 
a MAGIC method, that the method they arc compil- 
ing is an authorized part of the Jvm. 

Code that needs to exploit the magic class must 
do so with extreme caution. The rules that are being 
circumvented are there for a reason. Certain oper- 
ations require great care. Computing with raw ad- 
dresses is particularly delicate. The MAGIC method 
objectAsAddress transmutes an object reference into 
a raw address (an int). This functionality is needed, 
for instance, to perform dynamic linking. It is, how- 
ever, problematic. Jalapeno's copying memory man- 
agers update object references when they move the 
referenced object, but raw addresses are not updated. 
Care must be taken to avoid garbage collection when 
computing with raw addresses lest a copying collec- 
tor invalidate them. This is prevented bv calhnc a 
method that disables garbage coIiccTion. 

A thread that has disabled garbage collection can- 
not try to create an object, because the system would 
hang if there were insufficient memory, (Note rjiai 
other threads are free to request memory. If it is un- 
available, these threads are delayed and a collection 
will be initiated as soon as garbage collection is re- 
enabled.) There are subtle implications of this re- 
striction. Classes cannot be loaded, since objects are 
created during class loading. This means dynamic 
linking must be avoided. Type casts (and stores into 
object arrays) cannot be allowed either, since these 
might also entail class loading. Similarly, if the thread 
were to try to enter a monitor on a shared object 
currently owned by a thread waiting for garbage col- 
lection, the system would be in deadlock. Thus, a 
thread must operate in a tightly restricted subset of 
Java capability when computing with raw addresses. 

It would also be somewhat problematic for a thread 
to yield (explicitly or implicitly) while its garbage col- 
lection is disabled- Such a yield might arbitrarily de- 
Isy reeded g-dd»agc collection. Implicit thread switch- 
ing i;> postponed (and e.xpHcit thread switching 
prohibited) while a thread's uarbage collection is dis- 
abled. 

There are approximately 650 Java classes in the Jala- 
peno system, of which approximately 1 10 access the 
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MAGIC class. Of these only 12 classes need to dis- 
able garbage collection. 

Appendix B: Getting started 

A fairly substantial set of services — a class loader, 
an object allocator, a compiler — must exist before 
a Jvm can load all remaining scn'ices required for 
normal operation. The initial services for a Jvm writ- 
ten in native code, or a Jvm that runs on top of an- 
other Jvm, are available from an underlying run-time 
routine. Jalaperio is not written in native code and 
it has no underlying run-time routines. Therefore, 
we assemble the essential core services into an ex- 
ecutable boot image prior to running the Jvm. This 
boot image is a snapshot of a Jalapeno virtual ma- 
chine written into a file. Later, this file is loaded into 
memory and executed. 

The boot image is created by a Java program called 
a boot-image writer. It constructs a mock-up of a run- 
ning Jalapeno virtual machine and then packages it 
into a boot image. The boot-image writer is an or- 
dinary Java program and it can run on any Jvm. The 
J\ni that runs ihc boot-image writer wil) be cailcci 
the source Jvm, and the resulting Jalapeno vinu m] 
machine, the target Jvm. 

The boo I -image writer resembles a cross compiler 
and linker: it compiles bytecodes to machine code 
and rewrites machine addresses to bind program 
components into a runnable image. However, since 
Jalapeno's compilers, class loaders, and run-time 
data structures are all in Java code, unlike most com- 
pilers, it must also bind "live" objects into the boot 
image. 

The boot-image writer instantiates, in the source 
Jvm, Java objects that represent the target Jvm. Then 
it uses Java's built-in reflection facility to translate 
these mock-up objects from the object model of the 
source Jvni to Jalapeno's object model. This self-ref- 
erencing aspect of the boot-image writer makes it 
relatively simple — it is really just an object model 
translator. 

Since Jalapeno is a Java program, each of its com- 
pon'^r*^*^ 1 Ir>'-^ ^^hif^rt ^^pH *v hoot-image writer 
car. co!j-:;-dct ihc mock-up uy cxLCuting special init 
metliods in each of Jalapefio's major subsystems. .\ 
customized class loader makes sure that any classes 
needed to execute this code arc loaded into the 
mock-up as well as into the source Jvm. As a class 
is loaded, its methods are compiled (by a Jalapeno 



compiler running in the source Jvm) and included 
in the mock-up. 

Tiiis strategy of loading classes iiUo boih ihc sr = • ce 
Jvm and its mock-up of the target Jvni requires a 
complete class list to succeed. If, when Ja]n;-erK) 
starts running, a method of the core run-time e ,ivi- 
roiimcnt references any class not in the boot ini jgc, 
an endless recursion results: the run-time environ- 
ment needs to load part of itself in order to load part 
of itself . . . and so on. 

The problem of determining the minimal set of 
classes needed in the mock-up to prevent this was 
solved using a combination of careful planning and 
trial and error. All of Jalapeno's core classes were 
named with a VM_ prefix. These are the classes 
needed to provide enough machinery to allow the 
virtual machine to perform compilation, memory 
management, and dynamic class loading. The spe- 
cial prefix is recognized by Jalapeno 's compilers and 
used to suppress normal dynamic linking rules: they 
never generate dynamic linking code between meth- 
ods whose classes have this prefix. The core c]a:> es 
were also carefully written to avoid unnecessarx v;sc 
of Java iibrai7 classes. Tiie fundamental classes- — 
java.lang.Object, java.lang.CIass, iava.lang.String, and 
a few I/O classes — ^were unavoidable exceptions. To- 
gether, the VM_ classes and fundamental Java classes 
formed a starting set of classes that we thought 
needed to appear in the boot image. 

A small number of additional dependencies (for ex- 
ample, Integer, Float, Double, and various array and 
exception classes) were then identified by trial and 
error. We built a boot image and attempted to ex- 
ecute it. If it crashed trying to (recursively) load class 
X, then we added A' to the list of classes written into 
the boot image and repeated the exercise. This pro- 
cess converged with a small number of retries and 
did not prove to be a maintenance problem once the 
implementation of the core VM_ classes stabilized. 

When the mock-up is complete, it is transformed into 
a boot image, lliis involves finding all the objects in 
the mock-up, converting them to Jalapeno*s object 
format, and storing them in a boot-image array. All 
components of a runniiK .*::!:.pcno vli iua] 
can be reached from a single j roc array (sec section 
on static fields and methods). In ihe mockup. ^hc 
JTOC is encoded by three parallel arrays: an arriiv of 
ints (for primitive values), an array of Object instances 
(for references), and a Boolean array to discrimi- 
nate between the two. The structure rooted in the 
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JTOC array is walked recursively and the values, both 
reference and primitive, encountered are translated 

into the hovot-imagc army. Sinco the lypc (nforria- 

s^; . " .y,- . :ctior; c:n ; . I.: . ' i , 

loaded Ciass is rciLTcncca iiom tiic j i OC, ai; nc^'cs- 
sary compiled method bodies will be included in ti.c 
boot image. 

The translation process uses reflection. The boot- 
image writer obtains the java.lang. Class object for 
each object in the mock-up and iterates over the 
fields returned by the getFields method. For each 
field, it extracts the field value from the source ob- 
ject and extracts the target field offset from Jalape- 
no's class description for the object. Then, it writes 
the value at that offset from the index of the object 
in the boot image. When object references are en- 
countered, we cannot use any value from the mock- 
up. The references in the mock-up are converted to 
boot-image addresses using a hash table maintained 
as boot-image space is allocated. (An array contain- 
ing the addresses of all references in the boot image 
can be included in the boot image to support relo- 
cation of the image at boot time.) 

Overall the boot-image witer copies Java objects, 
field by field, from the mock-up into the boot im- 
age, simultaneously translating from the source Jvra's 
to the target Jvm's object model. Relying on Java's 
reflection capability, we ran into one inconvenience: 
Sun's Java Development Kit, v 1.1.4 did not permit 
reflective access to private fields. This is not a prob- 
lem in the Java 2 Software Development Kit, which 
allows such access. We solved the problem in the ear- 
lier version by preprocessing the class files, turning 
the private bits off. 

In addition to the objects reachable from the JTOC 
array, two other objects are needed in the boot im- 
age: an initial thread object containing an empty 
stack ready to run the first instruction of the boot( ) 
method when Jalapeno starts up and a "boot record" 
to interface the boot image with the boot-image run- 
ner (described next). This boot record contains the 
start, end, and last-used addresses in the image, four 
register values used to start Jalapeno, the address 
of the booK ) method, and the addresses for AlX's 
system calls. When these values are stored in the 
boot -image array, it is written disk. 

A short program called a boot-image runner starts 
Jalapeiio running. It reads the boot image into mem- 
ory, sets lijc four rcgi.stcrs to the indicated values, 
and branches to the bootO method. The boot-im- 



age runner is written in C (with a little assembler to 
set the registers and perform the final branch), not 
Java code, so // docs not rccjiiire a Jvm to r in on. 

. licn the bootO method starts executing, the vir- 
tual machine is in a fragile st;<te: i; can run a single 
thread of machine instructions, bul ii has not yet cre- 
ated the external opcra^i^igsy^iem resources it needs 
to support its own execution. These operating sys- 
tem resources cannot be created by the boot-image 
writer, because they refer to external state that will 
not exist until the boot image is executed. Thus, J ala- 
peno must perform additional initialization. 

At boot time, the virtual madiine initializes hard- 
ware-specific addresses (for example, it will eventu- 
ally establish a hardware guard page on its own 
stack), opens files corresponding to the Java library's 
System.ln, System.out, and System.error stream ob- 
jects, parses command line arguments, and creates 
a System.Properties object corresponding to the cur- 
rent execution environment. Hicn, the multithread- 
ing subsystem is initialized by creating operating sys- 
tem threads to serve as the virtual processors upon 
which Java threads are multiplexed. Finally, timer 
interrupts are enabled to support thread preemp- 
tion and a Java thread is spawned to run the appli- 
cation program specified on the command line. 

Jalapeno runs until the last (nondaemon) Java thread 
terminates or Sys1em.exit( ) is called. 

•Trademark or registered trademark of international Bu-siness 
Machines Corporation. 

**Trddemarkor registered trademark of Sun Microsystems, Inc. 
or Transaction Processing Peiformancc Council. 
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